Here are
47 public repositories
matching this topic...
Updated
Sep 10, 2020
Python
Glue scripts for converting AWS Service Logs for use in Athena
Updated
Sep 12, 2019
Python
COVID Response - Analytics, AI, and data science API and sample notebooks
Updated
Apr 23, 2020
Jupyter Notebook
Bring your own data Labs: Build a serverless data pipeline based on your own data
Data Lake as Code, featuring ChEMBL and OpenTargets
Updated
Sep 11, 2020
TypeScript
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
Updated
Jun 26, 2020
Python
Build and Deploy A Serverless Data Pipeline on AWS
Updated
Nov 1, 2019
Python
🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena
Updated
Sep 1, 2020
Python
Automate the daily partitioning of your CloudTrail bucket in Athena
Updated
Jul 20, 2020
JavaScript
Automated data quality suggestions and analysis with Amazon Deequ on AWS Glue
Updated
Sep 3, 2020
Scala
🐋 Docker image for AWS Glue Spark/Python
Updated
May 26, 2020
Dockerfile
A CLI to manage and monitor permissions in AWS Lake Formation
Updated
Mar 31, 2020
Python
Discover how you can migrate from traditional deployments to serverless architectures with AWS
Updated
Feb 1, 2019
JavaScript
AWS Glue tutorial for data developers.
Updated
Sep 2, 2019
Python
Demo for building Real Time Data Collection Pipeline on AWS
Updated
Jan 8, 2019
JavaScript
Proof of Value Terraform Scripts to utilize Amazon Web Services (AWS) Security, Identity & Compliance Services to Support your AWS Account Security Posture.
Terraform module which creates Glue resources on AWS
Example of how to set SBT up for local development of AWS Glue Scripts
Updated
May 15, 2020
Scala
AWS ETL example via AWS DMS & AWS Glue
AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Updated
May 13, 2020
Python
Personal take on GraphDB + AML with AWS Neptune + Glue + Lambda.
Updated
Nov 13, 2018
Python
This is a project which demonstrates creation of a data pipeline by scraping data using twitter API and creating a data delivery stream using Kinesis Firehose for ingesting data to Amazon S3.
Updated
May 4, 2020
Python
AWS Glue - Incremental Pull Script
Updated
Apr 10, 2019
Python
✨ This package helps you use AWS Glue easily.
Updated
Aug 10, 2020
Python
AWS hosted enterprise Data Lake with both batch and realtime data pipelines.
Financal data streaming and analysis with AWS Kinesis and Athena
Updated
Jun 9, 2020
Jupyter Notebook
DevOps에 대한 개념 이해와 AWS 개발자 도구를 활용한 실습 및 연구
Updated
Jun 15, 2020
Java
Discover how you can migrate from traditional deployments to serverless architectures with AWS
Updated
Feb 26, 2019
JavaScript
Pexip Infinity log analysis on the AWS cloud
Improve this page
Add a description, image, and links to the
aws-glue
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
aws-glue
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.