AWS Tutorials - Building Federated Data Lakes with AWS

AWS Tutorials - Working with Data Sources in AWS Glue Job

AWS Tutorials - Using Concurrent AWS Glue Jobs

Does it SUCK? - The First Descendant Review (PS5/Xbox/PC)

Coldplay - feelslikeimfallinginlove (Official Video)

Athers and Nasser REACT to India's incredible T20 World Cup win! 🏆 | Sky Sports Cricket Vodcast

AWS Tutorials - Methods of Building AWS Glue ETL Pipeline

AWS Tutorials

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 июл 2024
AWS Glue Pipelines are responsible to ingest data in the data platform or data lake and manage data transformation lifecycle from raw to cleansed to curated state. There are many methods to build such pipelines. In this video, we talk about some of these methods and compare them for reusability, observability and development effort.
Наука

Комментарии • 36

@santospcs2011 2 года назад
Thank you for the pipeline video, very insightful.
@AWSTutorialsOnline 2 года назад
Glad it was helpful!
@hirendra83 2 года назад ⁺¹
Excellent Tutorial. Thanks
@AWSTutorialsOnline 2 года назад
You are welcome!
@nttazitt1300 2 года назад ⁺¹
Very helpful tutorial, thanks.
@AWSTutorialsOnline 2 года назад ⁺¹
Glad it was helpful!
@rollinOnCode 2 года назад ⁺¹
this is super good and helpful. thank you
@AWSTutorialsOnline 2 года назад
You're very welcome!
@nrodriguezgal148 Год назад
Excellent video and explanation.
@AWSTutorialsOnline Год назад
Glad it was helpful!
@YEO19901 2 года назад
Wonderful.
@AWSTutorialsOnline 2 года назад
Thanks
@hsz7338 2 года назад ⁺²
As always thank you for the video. The breakdown comparison is incredibly intuitive. I am curious about your view on which approach is best in handling pipeline replay (i.e. handling pipeline failure) and CI/CD process (i.e. pipeline as code)?
@AWSTutorialsOnline 2 года назад ⁺¹
CICD support is available for all approaches. Replay is better in event driven approach because you can run part of pipeline based on error raised during the event.
@adityag020 2 года назад ⁺²
Insightful tutorial. Can you make a practical video based on event based pipeline using Dynamodb to store metadata and configurations with retry mechanism in case if it fails?
@AWSTutorialsOnline 2 года назад
sure - coming soon :)
@mangeshshinde2844 2 года назад ⁺²
Nice tutorial. Can you make some practical tutorial for event based pipeline?
@AWSTutorialsOnline 2 года назад ⁺¹
Yes, sure. I am getting multiple requests for that. I will do it.
@user-lq6gc1tw2v Год назад ⁺¹
Hello, good video. Maybe someone knows when use Glue workflows and when use StepFunctions?
@AWSTutorialsOnline Год назад
Glue workflow when you want to orchestrate Glue Job and Crawler only. StepFunction when you want to orchestrate Glue Job, Crawler plus other services as well.
@pachappagarimohanvamsi4641 2 года назад ⁺²
Could you please make some practical workshop kind of thing on these approaches?
@AWSTutorialsOnline 2 года назад ⁺¹
Sure. will do. The Glue workflow lab is already available @ aws-dojo.com/workshoplists/workshoplist29/
@timmyzheng6049 2 года назад ⁺¹
Thank you for the pipeline video, very insightful. Quick question: to avoid hardcoding, can I also use DynamoDB for storing environment parameters like s3 paths / file names / business date for my ETL pipeline let's say using step functions, and what do you think is the best industry practice for storing parameters for AWS ETL pipeline?
@AWSTutorialsOnline 2 года назад
if you want to decouple the configuration, you should keep the configuration centralized in the services likes DynamoDB or Parameter Store. DD is especially good if you are going for multi-account deployment and you still want to keep the configuration centralized.
@timmyzheng6049 2 года назад
@@AWSTutorialsOnline Thank you for the response. After doing some research it seems that to pass parameters to glue job, I have to use Lambda with boto3 in step function, and as lambda can call glue job using python glue API too, does that mean there is no need to put glue job separately to step function?
@andresmerchan6418 Год назад ⁺¹
Hello! Which of the three methods is more cost effective?
@AWSTutorialsOnline Год назад
Event based.
@alokanand851 2 года назад ⁺¹
Hi all,
We are using AWS Glue + PySpark to perform ETL to a destination RDS PostgreSql DB. Destination tables have columns with primary & foreign keys with UUID data type. We are failing to populate these destination UUID type columns. How can we achieve this, please suggest.
@AWSTutorialsOnline 2 года назад
I am not sure what error you are getting. ETL job has to respect table level column constraint. As long as you are doing it; there should not be a problem.
@radhasowjanya6872 2 года назад
Hello Sir..I follow all your videos. They are very useful in my project. Thank you very much.I have a quick question: Is there a possibility to add multiple SQL statements in one AWS glue Studio job? if yes can you help me with it.(use case: want to truncate the target table(Snowflake) before loading)
@AWSTutorialsOnline 2 года назад ⁺¹
You can multiple SQL Transform one after another to run multiple SQL statement in sequence
@ankursinhaa2466 10 месяцев назад
I love you
@ryany420 Год назад
awesome tutorial! I have a quetion to ask if dont mind: how shall we deal with upsert/delete in those landing/clean/curated zones? I know databricks has similar archtechture with brozne/silver/gold, but it comes with delta lake. if our destination is Redshift, should we move data into Redshift(RDBMS) in earlier stage, like before curated zone. I also send you email, hope you can help to answer. thanks heaps....
@user-ib4pm2vw5x 2 года назад
Can we do machine learning algorithm in glue job using coding
@AWSTutorialsOnline 2 года назад
Yes, you can but not recommended. You should use glue job for feature engineering but not for training the model. Model training should be done in SageMaker.

Следующие

Автовоспроизведение

AWS Tutorials - Building Federated Data Lakes with AWS

AWS Tutorials - Building Federated Data Lakes with AWS

AWS Tutorials - Working with Data Sources in AWS Glue Job

AWS Tutorials - Working with Data Sources in AWS Glue Job

AWS Tutorials - Using Concurrent AWS Glue Jobs

AWS Tutorials - Using Concurrent AWS Glue Jobs

Does it SUCK? - The First Descendant Review (PS5/Xbox/PC)

Does it SUCK? - The First Descendant Review (PS5/Xbox/PC)

Coldplay - feelslikeimfallinginlove (Official Video)

Coldplay - feelslikeimfallinginlove (Official Video)

Athers and Nasser REACT to India's incredible T20 World Cup win! 🏆 | Sky Sports Cricket Vodcast

Athers and Nasser REACT to India's incredible T20 World Cup win! 🏆 | Sky Sports Cricket Vodcast

BABYMONSTER - ‘FOREVER’ M/V

BABYMONSTER - ‘FOREVER’ M/V

AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]

AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]

AWS Tutorials - ETL Pipeline with Multiple Files Ingestion in S3

AWS Tutorials – ETL Pipeline with Multiple Files Ingestion in S3

AWS Tutorials - Using Job Bookmarks in AWS Glue Jobs

AWS Tutorials - Using Job Bookmarks in AWS Glue Jobs

AWS Tutorials - Partition Data in S3 using AWS Glue Job

AWS Tutorials - Partition Data in S3 using AWS Glue Job

AWS re:Invent 2022 - [NEW] Monitor & manage data quality in your data lake with AWS Glue (ANT222)

AWS re:Invent 2022 - [NEW] Monitor & manage data quality in your data lake with AWS Glue (ANT222)

AWS Tutorials - Using External Libraries in AWS Glue Job

AWS Tutorials - Using External Libraries in AWS Glue Job

AWS Tutorials - AWS Glue Studio vs. Glue DataBrew

AWS Tutorials - AWS Glue Studio vs. Glue DataBrew

AWS Tutorials - Using AWS Glue Workflow

AWS Tutorials - Using AWS Glue Workflow

AWS re:Invent 2022 - How Disney used AWS Glue as a data integration and ETL framework (ANT335)

AWS re:Invent 2022 - How Disney used AWS Glue as a data integration and ETL framework (ANT335)

РАСПАКОВКА ИНТЕРЕСНЫХ ПОСЫЛОК / ПОДАРКИ ОТ ПОДПИСЧИКОВ

РАСПАКОВКА ИНТЕРЕСНЫХ ПОСЫЛОК / ПОДАРКИ ОТ ПОДПИСЧИКОВ

ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК

ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК

Купил Samsung Galaxy S24 на Snapdragon 8 Gen 3 за 60к. на OZON из Китая! Обзор спустя Месяц

Купил Samsung Galaxy S24 на Snapdragon 8 Gen 3 за 60к. на OZON из Китая! Обзор спустя Месяц

ИГРОВОЙ ПК ЗА 10К КОТОРЫЙ ДЕЙСТВИТЕЛЬНО ТАЩИТ В 2024 ГОДУ / СБОРКА ПК ЗА 10000 РУБЛЕЙ by KOMPUKTER

ИГРОВОЙ ПК ЗА 10К КОТОРЫЙ ДЕЙСТВИТЕЛЬНО ТАЩИТ В 2024 ГОДУ / СБОРКА ПК ЗА 10000 РУБЛЕЙ by KOMPUKTER

Лучше будет, но нескоро | Ryzen 9000, RTX 50, Core Ultra 200 и NPU | Что с рынком железа?

Лучше будет, но нескоро | Ryzen 9000, RTX 50, Core Ultra 200 и NPU | Что с рынком железа?

81000 руб. за ремонт воздуха в игровом ноутбуке Legion7 и как испарить деньги испарительной камерой

81000 руб. за ремонт воздуха в игровом ноутбуке Legion7 и как испарить деньги испарительной камерой

Лого для клиента из Таджикистана. Анимация в After Effects

Лого для клиента из Таджикистана. Анимация в After Effects

Will the battery emit smoke if it rotates rapidly?

Will the battery emit smoke if it rotates rapidly?