- Видео 13
- Просмотров 619 769
jayzern
Добавлен 18 сен 2021
I'm Jay, I love making videos about travel, self-help and tech. I currently work in New York City as a data engineer, but I grew up in Malaysia and lived in the UK when I was 19. Back then, I had no idea what life was about, moving to so many places, navigating career in Tech. Today, I've learned a lot and wanna share my perspective through filmmaking.
What is Dagster? Asset Based Orchestration [2hr full course]
Dagster is a declarative, asset-based orchestrator that redefines the way we think about managing workflows. In this video, we’ll learn about features of Dagster including Assets, Resources, Jobs, Schedules, Partitions and more. We’ll do this by creating an End to End project from scratch using Dagster, data load tool (dlt) and Snowflake. I’m super excited about this one. Hope you’ll learn something new!
Timestamps ⏰
0:00 - Intro
1:45 - System Design
6:30 - Setup Dagster and data load tool (dlt)
21:40 - How to define Assets
40:08 - Resources, code refactor
43:22 - Schedules and Jobs
48:42 - Backfill DAGs using Partitions
54:39 - Sensors and Automaterialization policy
1:09:31 - Final thoughts
Notes ...
Timestamps ⏰
0:00 - Intro
1:45 - System Design
6:30 - Setup Dagster and data load tool (dlt)
21:40 - How to define Assets
40:08 - Resources, code refactor
43:22 - Schedules and Jobs
48:42 - Backfill DAGs using Partitions
54:39 - Sensors and Automaterialization policy
1:09:31 - Final thoughts
Notes ...
Просмотров: 8 481
Видео
Code along - build an ELT Pipeline in 1 Hour (dbt, Snowflake, Airflow)
Просмотров 165 тыс.10 месяцев назад
How to build an ELT pipeline in 1 hour, using industry standard tools such as dbt, Snowflake and Airflow. This is a live coding tutorial, where I’ll walk you through the thinking process, and show you every step. We’ll cover basic data modeling techniques (fact tables, data marts), snowflake RBAC concepts, and how to orchestrate a dbt project using Airflow. Drop down in the comments section wha...
Intro to Amazon EMR - Big Data Tutorial using Spark
Просмотров 35 тыс.Год назад
Edit* Make sure you encrypt your Spark script as you upload it inside S3 (timestamp: 13:42) There's a small typo in line 41 of the code, should be "add_argument" Intro Today we're going to talk about a popular tool in Data Engineering. Amazon EMR is an industry-leading big data platform. It's a really mature service developed way back in 2009, and draws a lot of heuristics from the Apache Hadoo...
Top 5 SQL Interview Questions for Data Engineers
Просмотров 6 тыс.Год назад
A lot of people struggle to learn SQL. When it comes to interviews they feel super anxious, especially in this economy where it's getting 10x harder to find jobs. In this video, I'll show you how to CRUSH your next Data Engineering SQL interviews, through these 5 handpicked questions. We'll focus predominantly on the problem solving aspect. At the end of the video, I'll share my tips & tricks o...
How I would learn Data Engineering (if I could start over)
Просмотров 381 тыс.Год назад
In this video, I’ll share my step-by-step process on how I would learn Data Engineering if I could start over. Data Engineering is a fast emerging field within the Tech industry; where more and more people from traditional data science/software backgrounds are pivoting towards. We’ll cover the fundamentals of Data Engineering, and talk about some advanced topics you’ll need to learn in order to...
Living abroad is HARD (what I learned after 8 years in new york and london)
Просмотров 4,5 тыс.Год назад
When you're traveling across different countries, it's very easy to cherry pick the best parts of different places to create a perfect image in your head. The reality is, living abroad in a foreign country versus going on vacation is two completely separate things. In this video, I highlight some key learnings after living abroad for over 8 years, and share insightful tips on how to ease that t...
MALAYSIA | Asia's Hidden Gem
Просмотров 9 тыс.Год назад
Malaysia is one of THE most underrated countries in Asia 🇲🇾. Last December, I went back home after being away for almost 3 years. I couldn't really find any videos that really showcase how amazing the country is (scenery, local food, culture), so I wanted to make one myself. It was a super hectic trip, trying to film and catch up with friends in only 2 weeks. Who am I? 🙋🏻♂️ I'm Jay, I love mak...
Fall Foliage in Vermont
Просмотров 6662 года назад
Road trip from NY to Vermont Thanks to @yarnehermann @andrealee_x @tiffy_le @jd_lassiter @velalu77 and yuki sensei Gear: Canon R6 RF 24-105mm F4-7.1 is STM RF 35mm f/1.8 Macro IS STM Lens Iphone 13 Pro
Los Angeles
Просмотров 9473 года назад
City of angels Featuring @gareygan @lu8296 @junga_julia @derek_chen01 @ivkyoung Gear: Sony A6400 Tamron 17-70mm f/2.8 Sony E 35mm f/1.8 OSS
Why did you choose airflow over dagster?
brilliant tutorial, thanks for this!
Thank for rich content!
Jay good job 🎉
it is really helpful! thanks Jay.🙂
Thanks for this, amazing! Also love the debugging how not everything is correct on first try, that’s really helpful
I was struggling to simplify airflow and DBT integration and this tutorial really helped me get through the finish line. Thank you!
amazing tutorial
I find hard to understand how to use dagster in a more memory friendly way. For example, let's assume I have a bigquery result that has a large amount of rows. Those rows has to be mapped into something else then two assets should take the data and write it another tables. How those kind of operation should be done by using reusable assets or op? Let's say... Asset A does the query and return the iterator (with result() ) An op B takes element by element and transform it and takes it back to another asset that doesn't expect the full data but it works with chunks and write them to db. I couldn't find anything that uses a more stream like approach. Should we pass objects like spark dataframe or it can be done in a easier way with op and asset annotations ? I think dagster, generators and iterators topic between op and asset can be a great topic to discuss about.
💗
Is it still worthwhile to learn this when starting a career as a data engineer, especially now that AI is automating almost everything? I'm asking because I'm currently a programmer exploring alternative career paths.
Your guideline is a gem. But the airflow part is not very clear, i deep dive so many times to fix hahaha
Thank you for this content
Hi @jayzern, thanks for video. Is the airflow running singular tests as well? Where did we mentioned "dbt test" in the airflow ?
Hey what setup are you using? Keyboard, etc?
great video!
THIS IS WHAT IDONT LIKE ABOUT I.T BLOGGERS THEY TALKING TOO MUCH OUTSIDE THE TOPIC....IT GETS BORING.
amazing vedio very clear to explain How snowflake,dbt,airflow and cosmos are all linked together to provide data transformation and the orchestration.
why everybody are listing completely different things with just a comma: apples, cars, planets, pianos....
Is it equally useful for Non-IT background candidates.
damn dude, you write queries like pro.... great to learn from you.
dude that dlthub is so cool
Great video! What text editor are you using?
VS Code
I am 32 and i am transitioning into Data engineering. I have a basic understanding of Python but i have never touched SqL before. How feasible are my chances to succeed in this?
Their documentation seems good but this is the kind of thing that helps me learn! Huge thank you for putting this together!
Thank you! This was exactly what I needed.
This video has the exact answer to my questions as I'm diving into data modeling for analytics. I'm sure everyone doing this for their first time that they will find this video super helpful. Would be cool to see dbt with Cosmos for smoother operation 👌 EDIT: I was literally just getting into the Deployment part of the video, and there you introduce using Cosmos for Airflow. Kudos!!
Hi what the support may help the beginner by owner course by self should be possible on data engineer
make video related star and dimension modeling
make video related star and dimensional modeling
Tell me one thing , is data engineering good job profile for freshers
Very well explained EMR video, thank you
Hi I am trying your proect and got stuk here can you here 21:32:24 Unable to do partial parsing because saved manifest not found. Starting full parse. 21:32:25 Encountered an error: Compilation Error Model 'model.DATA_PIPELINE.stg_tpch_orders' (models/staging/stg_tpch_orders.sql) depends on a source named 'tpch.orders' which was not found
When ETL came about the Cloud did not exist, I was writing shell scripts and SQL almost 30 years ago to do ETL. Useful video thanks!
i need to learn more from u keep posting
Concise and to the point. It was very helpful. Thanks, please show more end to end complex projects like this
Hi! really enjoy your tutorial, would like to see a tutorial how to create data CI/CD pipeline starting from pulling latest branch, running data test on staging, and deploy changes to production after test is complete since not lot of youtuber explaining this
This is actually a brilliant idea, thanks for the rec!
very good session, helped me get a much more concrete idea about how those tools look like and how they work together
I don't have degree in data science but I am pursuing degree in software system engineering. Can i get a job
Great VIdeo
Dude, what do you mean if you could start over? You’re just starting haha
(venv) PS C:\Users\hsrak\Desktop\DataManagemet2 ew\data_pipeline\dbt-dag> brew install astro brew : The term 'brew' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1 + brew install astro + ~~~~ + CategoryInfo : ObjectNotFound: (brew:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException Please help me
It's beautiful! Thx man!
I have done beside knowledge in Python and more but I really want to move in to data engineering. Please can you help
That was FAST, you are subscribed :D Any vids related to "Amazon Managed Workflows for Apache Airflow"???
Good vid but move your facecam out of the terminal
I cannot run my dbt project. I’m still a beginner but I do not understand why this happens, considering that my macros directory is empty except for a .gitkeep file: Compilation Error dbt found two macros named "materialization_table_default" in the project "dbt". To fix this error, rename or remove one of the following macros: - macros/materializations/models/table/table.sql - macros/materializations/models/table.sql
thank you, this is great tutorial
hi ! I'm having trouble connecting to snowflake. can someone please help me resolve it . I just started learning dbt and snowflake . Runtime Error Database error while listing schemas in database "dbt_db" Database Error 250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting
worth checking your snowflake credentials again, I got the same error due to an incorrect account id
nice, keep going