aws-glue

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

aws-glue

Here are 47 public repositories matching this topic...

awslabs / aws-data-wrangler

awslabs / athena-glue-service-logs

aws-samples / aws-open-data-analytics-notebooks

tokern / piicatcher

Shallow scan should recognize phone, credit card, person and location from column names

Recognize file formats like CSV and JSON and process them appropriately.

aws-samples / bring-your-own-data-labs

aws-samples / data-lake-as-code

aws-samples / analyzing-reddit-sentiment-with-aws

vincentclaes / serverless_data_pipeline_example

awslabs / amazon-athena-cross-account-catalog

GorillaStack / athena-cloudtrail-partitioner

aws-samples / amazon-deequ-glue

webysther / aws-glue-docker

tokern / lakecli

TrainingByPackt / Serverless-Architectures-with-AWS

mikaelahonen-solita / aws-glue-tutorial

canyousayyes / aws-real-time-data-collection

jonrau1 / AWS-ComplianceMachineDontStop

chgasparoto / terraform-aws-glue

jhole89 / aws-glue-sbt-quickstart

bdoepf / aws-etl-example

svajiraya / aws-glue-libs

geeknam / aws-neptune-aml

gchatterjee-git / Data-Pipeline-AWS

akhilpatlolla / Generic_ETL_Utility_AWS_GLUE

da-huin / easy-glue

scriptbuzz / aws-datalake-poc-video

xianchen2 / Financal_Data_Streaming

mincloud1501 / DevOps

TrainingByPackt / Serverless-Architecture-with-AWS-eLearning

mlnrt / pexip-logs-in-aws

Improve this page

Add this topic to your repo