AWS Tutorials - AWS Glue Data Quality - Automated Data Quality Monitoring
HTML-код
- Опубликовано: 17 дек 2022
- AWS Data Quality is an automated serverless services to monitor and evaluate data qualilty both at rest and in move within the ETL job. It can evaludate qualilty for both statistics and values of the data. Learn how to use AWS Data Quality to evaluate data at rest as well as in move.
Наука
Welcome back sir, waiting for your more videos .. I learned alot from you... Thanks for providing this tutorials for free
So nice of you
Great video, i was already going to resolve that with a lambda, so more easy with glue data quality, thank you
thanks for the video. very details.
Glad it was helpful!
Welcome back brother. waiting for your videos. Hope everything is fine
All good. Sorry for a long pause from my side,
hey I love your tutorials, Thank You for making our life simpler. so I want know that can we do data warehouse testing with this tool when tables is in Redshift
Currently it support hive metastore with S3 bucket only.
Great demo! The retry count should have been 0 to prevent re-running.
agree. I realized it later
Thank you for taking us through the new feature that AWS Glue offers. Do you see Glue Data Quality replacing Glue Data Brew, at least from the Data Quality perspective?
I don't think data quality in Brew will be replaced. Both will exist. Brew is more for adhoc data preparation and Glue job for automated. Both need data quality feature for their purposes.
Great vid! Thanks!
Can this Data Quality integrated with CI/CD and Terraform?
You mean to be able to configure Data Quality using infrastructure as Code. I am not sure - I did not check CloudFormation or Terraform. But it does support APIs for sure.
In your Glue demo, it *seemed* you skipped showing a part. How did you get from a file being dropped into the S3 bucket/sales to it becoming a table? I'm looking for the most code-light way to set this up so my lambda will somehow be triggered once the file is turned into a table, so my lambda can then run the rules defined in console and then write a log file to other S3 bucket/folder of which rows/columns failed. Thank you!
The table is created using AWS Crawler. I did not mention that because I have covered than my other tutorials.
Is it possible to this entire thing using boto3 in python
Hi Brother, I’m a big fan of yours. I have learned many things from your channel and thanks a lot.
Please provide your LinkedIn.
Many Thanks, sorry for a long pause from my side.