- Видео 13
- Просмотров 509 216
jayzern
Добавлен 18 сен 2021
I'm Jay, I love making videos about travel, self-help and tech. I currently work in New York City as a data engineer, but I grew up in Malaysia and lived in the UK when I was 19. Back then, I had no idea what life was about, moving to so many places, navigating career in Tech. Today, I've learned a lot and wanna share my perspective through filmmaking.
What is Dagster? Asset Based Orchestration [2hr full course]
Dagster is a declarative, asset-based orchestrator that redefines the way we think about managing workflows. In this video, we’ll learn about features of Dagster including Assets, Resources, Jobs, Schedules, Partitions and more. We’ll do this by creating an End to End project from scratch using Dagster, data load tool (dlt) and Snowflake. I’m super excited about this one. Hope you’ll learn something new!
Timestamps ⏰
0:00 - Intro
1:45 - System Design
6:30 - Setup Dagster and data load tool (dlt)
21:40 - How to define Assets
40:08 - Resources, code refactor
43:22 - Schedules and Jobs
48:42 - Backfill DAGs using Partitions
54:39 - Sensors and Automaterialization policy
1:09:31 - Final thoughts
Notes ...
Timestamps ⏰
0:00 - Intro
1:45 - System Design
6:30 - Setup Dagster and data load tool (dlt)
21:40 - How to define Assets
40:08 - Resources, code refactor
43:22 - Schedules and Jobs
48:42 - Backfill DAGs using Partitions
54:39 - Sensors and Automaterialization policy
1:09:31 - Final thoughts
Notes ...
Просмотров: 3 342
Видео
Code along - build an ELT Pipeline in 1 Hour (dbt, Snowflake, Airflow)
Просмотров 113 тыс.7 месяцев назад
How to build an ELT pipeline in 1 hour, using industry standard tools such as dbt, Snowflake and Airflow. This is a live coding tutorial, where I’ll walk you through the thinking process, and show you every step. We’ll cover basic data modeling techniques (fact tables, data marts), snowflake RBAC concepts, and how to orchestrate a dbt project using Airflow. Drop down in the comments section wha...
Intro to Amazon EMR - Big Data Tutorial using Spark
Просмотров 27 тыс.Год назад
Edit* Make sure you encrypt your Spark script as you upload it inside S3 (timestamp: 13:42) There's a small typo in line 41 of the code, should be "add_argument" Intro Today we're going to talk about a popular tool in Data Engineering. Amazon EMR is an industry-leading big data platform. It's a really mature service developed way back in 2009, and draws a lot of heuristics from the Apache Hadoo...
Top 5 SQL Interview Questions for Data Engineers
Просмотров 5 тыс.Год назад
A lot of people struggle to learn SQL. When it comes to interviews they feel super anxious, especially in this economy where it's getting 10x harder to find jobs. In this video, I'll show you how to CRUSH your next Data Engineering SQL interviews, through these 5 handpicked questions. We'll focus predominantly on the problem solving aspect. At the end of the video, I'll share my tips & tricks o...
How I would learn Data Engineering (if I could start over)
Просмотров 340 тыс.Год назад
In this video, I’ll share my step-by-step process on how I would learn Data Engineering if I could start over. Data Engineering is a fast emerging field within the Tech industry; where more and more people from traditional data science/software backgrounds are pivoting towards. We’ll cover the fundamentals of Data Engineering, and talk about some advanced topics you’ll need to learn in order to...
Living abroad is HARD (what I learned after 8 years in new york and london)
Просмотров 4,4 тыс.Год назад
When you're traveling across different countries, it's very easy to cherry pick the best parts of different places to create a perfect image in your head. The reality is, living abroad in a foreign country versus going on vacation is two completely separate things. In this video, I highlight some key learnings after living abroad for over 8 years, and share insightful tips on how to ease that t...
MALAYSIA | Asia's Hidden Gem
Просмотров 9 тыс.Год назад
Malaysia is one of THE most underrated countries in Asia 🇲🇾. Last December, I went back home after being away for almost 3 years. I couldn't really find any videos that really showcase how amazing the country is (scenery, local food, culture), so I wanted to make one myself. It was a super hectic trip, trying to film and catch up with friends in only 2 weeks. Who am I? 🙋🏻♂️ I'm Jay, I love mak...
Fall Foliage in Vermont
Просмотров 650Год назад
Road trip from NY to Vermont Thanks to @yarnehermann @andrealee_x @tiffy_le @jd_lassiter @velalu77 and yuki sensei Gear: Canon R6 RF 24-105mm F4-7.1 is STM RF 35mm f/1.8 Macro IS STM Lens Iphone 13 Pro
Los Angeles
Просмотров 9093 года назад
City of angels Featuring @gareygan @lu8296 @junga_julia @derek_chen01 @ivkyoung Gear: Sony A6400 Tamron 17-70mm f/2.8 Sony E 35mm f/1.8 OSS
Hello, thanks for this tutorial. At the very beginning, when trying to run the "dbt deps" command I'm getting this error : "Encountered an error loading local configuration: dbt_cloud.yml credentials file for dbt Cloud not found. Download your credentials file from dbt Cloud to `C:\Users\a.schirina\.dbt`". I'm using dbt command locally and my profiles.yml in the .dbt folder is data_pipeline: target: dev outputs: dev: type: snowflake account: jpb45436 # User/password auth user: alices password: mypassword role: dbt_role database: dbt_db warehouse: dbt_wh schema: dbt_schema threads: 4 client_session_keep_alive: False Does anyone know the problem?
Thank you for the video jayzern. When I push code into Git, should I push code of dbt only, or I need to push all code of dbt-dag ?
Best tutorial I've seen so far. Was confused between Glue and EMR for a future projects requiring big compute power with control over each node.
nice
Great guide bro...
Thank you brother
Thanks a lot for this tutorial
So to my understanding, the singular tests really mean to check if nothing is the result of the query been tested. If the test is true, then nothing equates to the query been tested - Great your data is fine. If false, you should run that query to see what exactly are those rows. Confusing at first but makes sense now.
What are the versions of pandas and matplotlib used in your project?
Garcia Michelle Davis Jose Jones Jason
very sincere and very true points, thank you very much
i am impressed with your video can you comment those books which you are referring , it will help me a lot.
Ziemann Bypass
Dagster is amazing but rather complex. This course is amazing!!
Thank you very much. This is very nice and concise tutorial, exactly what I need.
Thompson Timothy Lewis Anthony Lee Richard
Is data engineering dead with advent of AI ? What is the future of data engineering careers in your opinion ?
19395 Bryana Station
Hello, I followed the video and tried to compile and got the this error, please let e know if any one can assist 16:32:38 Running with dbt=1.8.0 16:32:38 Registered adapter: snowflake=1.8.3 16:32:38 Unable to do partial parsing because profile has changed 16:32:38 Unable to do partial parsing because a project dependency has been added 16:32:38 Unable to do partial parsing because a project config has changed 16:32:39 Encountered an error: Parsing Error Error reading oms_dbt_proj: staging\tpch_source.yml - Runtime Error Syntax error near line 9 ------------------------------ 6 | schema: tpch_sf1 7 | tables: 8 | - name: orders 9 | columns: 10 | - name: o_orderkey 11 | tests: 12 | - unique Raw Error: ------------------------------ while parsing a block collection in "<unicode string>", line 8, column 7 did not find expected '-' indicator in "<unicode string>", line 9, column 7
Gonzalez Margaret Clark Linda Brown Eric
Thank you, thank you THANK YOU! This was so helpful, easy to follow and made perfect sense.
Thank you very much
Miller Barbara Thomas Sarah Williams Edward
Im 23 and switching careers from aviation maintenance mechanic to the tech industry trying to get into the data field as i’ve heard the phrase “data is king” i was debating between coding and cybersecurity but i feel like data is the best spot and data engineering sounds like my niche question is should i go to a regular community college (BCC for me in south Florida) or a university aswell or can i break into the field through certifications through technical/vocational school and what certifications are baseline i should go for im trying to find the most efficient and fastest way to break in considering my age (23) im fully committed to this aswell
why do you need to create a VPC?
VPC is for nodes. It allows them to communicate between each other and the master node.
Roberts Isle
Johnson Dorothy Martinez Susan Hall Anna
Hi, thanks for the top view of the Dagster, do you plan to do the next one, but how to test the whole pipeline or their elements?
Honestly still thinking! If there's enough interests on more Dagster videos. Hard to make videos when I'm working full time 😅
In the video at this timestamp ruclips.net/video/Xe8wYYC2gWQ/видео.htmlfeature=shared&t=675, I noticed you're using `setuptools` in `setup.py` instead of relying on a `requirements.txt` file. I'm curious-what are the advantages of using `setuptools` over the more common `pip` or `pipenv` approaches? Many of the packages you listed seem to be available via `pip`, so it seems to add a bit of complexity. Could you explain the reasoning behind this choice? By the way, I’m not criticizing the approach, just genuinely interested in understanding the benefits.
Hey this is a really interesting question `requirements.txt` lists your packages you want to install using pip, but it doesn't describe how the package is installed. The filename is arbitrary, and u can even call it `another_requirements.txt` and run pip install -r `setup.py` and `setuptools` is an alternative approach that installs pip dependencies + how you define a python package (name, metadata, packages, etc). It's more narrow in the sense that you're building for a single project only, and it's meant for redistributing your software on other machines, whereas `requirements.txt` is more suited for development environments.
Omg yr Malaysia!
Brown Gary Martinez Dorothy Garcia James
the type of video that makes me wanna quit the field because of how bad i feel about the level I am in , but its a very helpful video though
This is a great video. Thanks for providing real value.
thank you!! I watched the RUclips demo and it was really helpful. I also want to study spark on eks
🙏
Jay! Thanks for the video and content very cool to see. Curious why Airflow over something like FiveTran besides the ability to self host? Any gotchas?
FiveTran is not really an orchestration tool - it's really meant for the "Extract Load" part only. It's great because of Unix philosophy, i.e. "do one thing, do one thing well only", whereas Airflow is more of a generalist, task-based orchestrator. Another thing is FiveTran is super expensive, unless you're working on something enterprise-y
bro is so busy in guiding us, he forgot to drink 4 liters of water everyday
Currently working full time while I get my degree I would love to get a job in Database engineering when I graduate helpful info ty!
Does This applies for Freshers also?
26:00 item_discount_amount is supposed to be negative because the macro defined it as such. I also checked the data on snowflake and they're all negative amounts. Did I miss something?
Hi @jayzern, thanks a lot for your video, really valuable content!
This is such an amazing video @jayzern! The project taken was not overly complex but also not barebones and covered a lot of important stuff! Thanks for being thoughtful and including the code along link (else some of formatting issues would have bugged many newbies)! I think you should keep creating more videos as you are a good teacher. Only suggestion I have is may be include a bit more explanation, which will help beginners even more! Kudos!
Hello did anyone else face this error at Airflow after @32:50 Broken DAG: [/usr/local/airflow/dags/dbt-dag.py] Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/cosmos/operators/base.py", line 361, in __init__ self.full_refresh = full_refresh ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1198, in __setattr__ if key in self.__init_kwargs: ^^^^^^^^^^^^^^^^^^ AttributeError: 'DbtRunLocalOperator' object has no attribute '_BaseOperator__init_kwargs'. Did you mean: '_BaseOperator__instantiated'? please send help
I am facing the exact same error. Please post a reply, if you were able to figure out the fix. I'll do the same if I find a solution.
Ok, so I think I was able to find the thread related to this issue.. Its still open as of 8/18/2024 11pm PT.. github.com/astronomer/astronomer-cosmos/issues/1161
Thanks @jayzern. This tutorial is awesome. I will be recommending it to folks who struggle with connecting dbt with any database engine.
hi guys kindly help me out, does only snowflakes and dbt is enought are i have to learn hadoop, spark etc i am working as data analyst for last 1 year and planning to switch to de
"Death by a thousand microservices" comes to mind lol
I'm starting my learning journey from here. It's a completely new world for me, both because I'm not a native English speaker and because I'm totally new to all this technology. I've made up my mind: I want to become a Data Engineer, and I'm going to work really hard to achieve that! Thank you so much for the guidance; I’m going to put it into practice now.
AN ABSOLUTE GOLDMINE OF AN INFORMATION WHICH NOT AY UDEMY OR RUclips TUTOR HAS PROVIDED YET!
Just for the information to all the learners this is not how things to be done in tech industries....you need to understand Terra form scripts along with jenkins which deploys aws services....you will not get access to go on management console and play around and do stuff.
Hi Jay, Im from Latin America, im going to start this month a career related to Web developing and design, I speak english really well and got XP as a field engineering in many places, how can i apply to a job in US/EU from SouthAmerica? Any tip will be considered, Ty!