What are some common data pipeline design patterns? What is a DAG ? | ETL vs ELT vs CDC (2022)

Поделиться
HTML-код
  • Опубликовано: 11 янв 2022
  • What are some common data pipeline design patterns? What is a DAG ? | ETL vs ELT vs CDC (2022)
    #datapipeline #designpattern #et# #elt #cdc
    1:01 - Data pipeline components
    4:10 - ETL design pattern (Extract, Transform & Load)
    7:15 - ELT design pattern (Extract, Load & Transform)
    10:37 - CDC design pattern (Change Data Capture)
    14:22 - EtLT design pattern (Extract transform Load & Transform)
    Hi Friends, I am Anshul Tiwari, and welcome to your youtube channel "IT k Funde" where we make I.T. interesting for everyone (Tech or No-Tech).
    **Do check out our popular playlists**
    1) Networking and Infra Concepts - • Networking & Infra Con...
    2) Data Analytics & Insights - • Learn - Data Engineeri...
    3) Google Cloud Platform Beginner Series -
    • Google Cloud Platform ...
    4) Latest technology tutorial (2021) -
    • What is a Data Vault ?...
    More about this video -
    Thanks for all your love on my Data Pipeline basics video - • What is Data Pipeline ...
    This video is a follow-up video that talks about some basic data pipeline design patterns that are used in data warehousing or data lake solutions. We will learn what is DAG (Directed Acyclic Graph) and its core components.
    Then we will move on to our 3 primary design patterns and an additional bonus sub-pattern. Below are the topics we will cover in this video.
    1 - Data pipeline components
    2 - ETL design pattern (Extract, Transform & Load)
    3 - ELT design pattern (Extract, Load & Transform)
    4 - CDC design pattern (Change Data Capture)
    5 - EtLT design pattern (Extract transform Load & Transform)
    PLEASE WATCH OTHER VIDEOS FROM THE POPULAR PLAYLISTS GIVEN BELOW. EVERY SINGLE LIKE, COMMENT AND SHARE MEANS THE WORLD TO ME!
    #itkfunde #keeplearning #keepsharing #keephustling
    Credits & Resources -
    images - pixabay.com
    Research - wikipedia
    **Social Channels**
    RUclips - / itkfunde
    Facebook - / itkfunde
    Linkedin - / ansh9685
    Twitter - / ansh9685
    Instagram - / itkfunde
    🚀🚀LAUNCHING MY 1st EVER ONLINE COURSE 🚀🚀
    "Cloud 101: AWS for Dummies - Your 1st Date with Cloud !"
    ✨Your first steps towards a Digital Cloud Career☁️✨
    🔥🔥Enroll Now via link below - www.itkfunde.net/courses/Clou...
    Highlights:
    ✅ No Pre-requisites, No Coding needed
    ✅ Course Starts on 11th September 2023
    ✅ Career Boost: Unlock new career opportunities by mastering cloud basics and AWS fundamentals.
    ✅ Step-by-Step Demos: Follow along with our easy-to-follow demos that walk you through key concepts.
    ✅ Live QnA session and career guidance session with me
    ✅ 2 years Access to the course
    ✅7-Day Money-Back Guarantee: I know you'll love the course. If for any reason you're not satisfied within the first 7 days from course launch, we offer a full refund.
    ✅ AWS Cloud Practioner Certification Guidance
    ✅ Bonus: Invitation to join my special Telegram community for Lifetime
    ✅ Bonus: After the course, stay personally connected to me for career guidance
    With a 7-day money-back guarantee, you can enroll with confidence. Don't miss out on this opportunity to learn and grow in the cloud industry.
    Enroll Now and take YOUR First Steps towards a Digital Cloud Career !!
    Hurry!!
    **About This Channel**
    Friends ITkFUNDE channel wants to bring I.T related knowledge, information, career advice, and much more to every individual regardless of whether he or she belongs to I.T or not. This channel is for everyone interested in learning something new!

Комментарии • 124

  • @MasQred
    @MasQred Месяц назад +1

    While extraction of data from operational database. Won't that affect the operetional databases performance. How to extract data without affecting it.

    • @ITkFunde
      @ITkFunde  Месяц назад

      Good Qs thats why in old times ETL pipelines used to run overnight when the operational systems were not under heavy use but today there are seamless replication tools like attunity that can replicate the data from source by reading the logs.

  • @patmclaughlin107
    @patmclaughlin107 10 месяцев назад +11

    Love this, man! Our engineers made it so hard to understand what DAG was. I thought I was not smart enough, but now I know they were either deliberately making it hard, or maybe they didn’t understand it themselves.

  • @andygarnet7191
    @andygarnet7191 Год назад +2

    Thanks man! Your explanations so clear and straight fwd. For years I spoke to do many engineers who would over -complicate pattern concepts or straight low ball the documentation to cover themselves when the pipelines blow up and impact the business. Keep up with these great videos!

  • @AamirAzizYouTube
    @AamirAzizYouTube 2 года назад +5

    Thanks so much, sir! This topic was a nightmare for me, you made it so simple to grasp. Keep up the good work!

  • @daves4026
    @daves4026 2 года назад +2

    Perfect. Full respect to your kindness and sharing of your knowledge

  • @metaocloudstudio2221
    @metaocloudstudio2221 2 года назад +3

    Good point , also the pros of using ELT over cons of ETL is creating normalizing tables and real-time materialized views

  • @deeptihazari3233
    @deeptihazari3233 Год назад +3

    How this Amazing channel was hidden till now ..this is called Quality content delivery 👍

  • @wendypark3848
    @wendypark3848 2 года назад

    I learned a lot network and data pipeline knowledge from you. It''s really hard to learn these from a book. Thanks a lot!

  • @bajicdusko
    @bajicdusko Год назад +17

    It always amazes me how we can have knowledge like this one click away! Fantastic content, keep up with good work.

    • @sumit12072007
      @sumit12072007 Год назад +1

      I was thinking the same while going through this video.

  • @mohit.srivastava
    @mohit.srivastava Год назад

    both this and the previous connected video explained the concept really well. thanks!!

  • @raghurajsawant24
    @raghurajsawant24 2 года назад +1

    You are doing a fantastic job. Love your videos.

  • @SeanRomberg
    @SeanRomberg Год назад

    Thanks for the share - you have helped me better understand the pipeline automation software that delivers orchestration, ingestion, transformation, and activation all in one. This makes sense now.

  • @francksgenlecroyant
    @francksgenlecroyant 2 года назад

    perfect video about Data Pipelines 👌, thanks!

  • @user-fm7wc9hy3j
    @user-fm7wc9hy3j Год назад

    I studied Spark and read DAG many times but just understand it now that i'm watching ur tutorial. thks

  • @federicogonzalez7673
    @federicogonzalez7673 2 года назад

    Im glad that I found your video in my feed, nice one

  • @vaidyanathashankar7441
    @vaidyanathashankar7441 Год назад

    Fantastic explanation, thanks for the wonderful session.

  • @mayurarun
    @mayurarun Год назад

    This is such a gem video. This would help me so much. Great work.

  • @ASHighlights668
    @ASHighlights668 2 года назад +1

    Very helpful sir your videos converts my nervousness into confidence !!

  • @altamashjawad6691
    @altamashjawad6691 2 года назад

    Thank you so much, very nice and comprehensive video!

  • @sultanqureshi2766
    @sultanqureshi2766 2 года назад +8

    Though its not exactly related to my current profile but its make me happy to learn more about the whole software industry from core and you are best at making this understand by making it simple. Understood the 4(ETL, ELT, ETLT, CDC) data pipeline at once.
    Video was not long at all
    Thanks

    • @ITkFunde
      @ITkFunde  2 года назад +1

      Thanks Diwakar for your support as always 🙏☺️

  • @DJEYkanjaria
    @DJEYkanjaria 2 года назад

    Great Video, Simple and detailed explanation

  • @guidodichiara2243
    @guidodichiara2243 Год назад

    Great job. Keep going on!

  • @Poornima_life
    @Poornima_life Год назад

    Absolutely…I liked the video ,content and your valuable efforts….thanks

  • @UTUBDZ
    @UTUBDZ 2 года назад

    Great content, thank you very much sir !

  • @almamun8291
    @almamun8291 Год назад

    Thank you very much, got clear concept about data pipelines

  • @AnandhabalanRadhakrishnan
    @AnandhabalanRadhakrishnan Год назад

    Well explained, keep sharing valuable information like this.

  • @Karma_Exists
    @Karma_Exists 13 дней назад

    This instructor has very good theory skills....... In the case of CDC source systems never have history of changes..... source systems are transactions (inserts/updates/deletes) they never store history
    .......ETL DESIGN TO HANDLE THE CDC using look-up the data transformed and stored in staging against the source and find the change then act as per history required.......but he has good basics

  • @AmanGupta-yf1hj
    @AmanGupta-yf1hj 9 месяцев назад

    Wonderful content

  • @greenshadowooo
    @greenshadowooo 10 месяцев назад

    Thanks for your sharing ! 😀😀😀

  • @rohithsai5265
    @rohithsai5265 2 года назад +1

    Great content 💯

  • @kristhomas8295
    @kristhomas8295 2 года назад

    Thank you so much for this!

  • @Liubov_110
    @Liubov_110 Год назад

    Thank you so much for this detailed video 👍

  • @TheAfroKingPlay
    @TheAfroKingPlay 2 года назад

    Very nice video man. Thanks I need this class. Take my like.

  • @mangesh4231
    @mangesh4231 6 месяцев назад

    Very detailed explanation, helpful. thanks a lot for all work and efforts.

  • @rahuldey1182
    @rahuldey1182 Год назад +3

    In my project, we are using CDC + EtLT design pattern for our data pipeline. All the design patterns of data pipelines are covered here. Very well presented, good job, keep going.

  • @JJ-ki2mw
    @JJ-ki2mw Год назад

    Thank you so much the way you described it is so easy to understand

  • @javierruizdiaz8656
    @javierruizdiaz8656 2 года назад

    Thank you, excellent Video.

  • @Lebrao09
    @Lebrao09 2 года назад

    great video!

  • @davidcamiloespitiamanrique9
    @davidcamiloespitiamanrique9 2 года назад

    Good one! probably, you can talk about AWS DMS and AWS GLUE

  • @ashokrajur09
    @ashokrajur09 2 года назад

    nice one, very informative

  • @emmanuelaolaiya
    @emmanuelaolaiya 2 месяца назад

    Great job and thanks

  • @jishuenkam6213
    @jishuenkam6213 2 года назад +2

    Not exactly a backend developer or data engineer, but this video is very informational on the various data pipeline designs!

  • @StaceyJ1908
    @StaceyJ1908 Год назад

    First, your videos are amazing....I have learned so much! I am looking at our current GCP implementation and trying to identify key risks across each step in the pipeline to determine if we have the correct controls in place or gaps...what are key risks to address at each stage of the data pipeline?

  • @the.abhisheksinha
    @the.abhisheksinha Год назад +1

    nicely explained !

  • @mohammadateef3339
    @mohammadateef3339 Год назад

    ur entry is osm sir

  • @hsiaoshuang
    @hsiaoshuang Год назад

    Very informative!

  • @aditiaditi3302
    @aditiaditi3302 29 дней назад

    Thanks for sharing this video :)

  • @PiyushSharma-jq8rr
    @PiyushSharma-jq8rr Год назад

    This was really good :-)

  • @mailsuresh9
    @mailsuresh9 2 года назад +2

    Sirjee tussi great ho. Thank you for making IT interesting

    • @ITkFunde
      @ITkFunde  2 года назад

      Thanks Suresh ☺️☺️🙏

  • @itneka
    @itneka Год назад

    Thanks for the information

  • @ashisharora9649
    @ashisharora9649 11 месяцев назад

    AMAZING

  • @lwhieldon1
    @lwhieldon1 2 года назад +1

    DAG concept is talked about a lot in data science. Can you talk about how this concept in data science correlates with the DAG design?

  • @mzeeshan
    @mzeeshan 2 года назад +1

    Loved the details mate!.

    • @ITkFunde
      @ITkFunde  2 года назад

      Thanks Zeeshan☺️

  • @vigneshbaskaran7931
    @vigneshbaskaran7931 11 месяцев назад +1

    Love this content, Thank you so much for all the efforts.

  • @GernPudman
    @GernPudman 7 месяцев назад

    Thanks!

  • @ganeshsrinivasannv4296
    @ganeshsrinivasannv4296 Год назад

    Thanks and it's a great work. Can you share a content on the data captured received as XML messaging pattern and advise on how to store that

  • @chinuamareashwar8146
    @chinuamareashwar8146 2 года назад

    nice explanation brother

  • @sunnyj1967
    @sunnyj1967 Год назад +1

    Its a terrific presentation.

  • @Vikas.007
    @Vikas.007 2 года назад +1

    Awesome content 👍👍
    Datamart video link in description plz share 🙏

  • @connect_vikas
    @connect_vikas Год назад +1

    Love you brother for beautifully explained this.

  • @swaragupta7932
    @swaragupta7932 Год назад +1

    Easy Explanation, Detailed video

  • @jayanth1376
    @jayanth1376 2 года назад +1

    👌👌👌

  • @GabrielJambert
    @GabrielJambert 6 месяцев назад

    Thank you

  • @victoraf4274
    @victoraf4274 10 месяцев назад

    such an amazing video! not bored at all (im not joking) hehe

  • @prashantprashant1291
    @prashantprashant1291 2 года назад

    Your videos are full of knowledge.. are u Data Solution Architect

  • @harishb8790
    @harishb8790 Год назад +1

    Amazing explanation. 👏

  • @lastboomer6164
    @lastboomer6164 Год назад

    Hello I very much appreciate the training. Would you consider a white board exercise whereas the ETL Jobs and Transformations are using a Metadata Data Driven ETL. - I learned that this is a good practice....but one downside is that this data design can not feed a data catalog "lineage"

  • @ajaykiranchundi9979
    @ajaykiranchundi9979 2 года назад +2

    Thank you so much! BTW it was certainly not at all a long video.

    • @ITkFunde
      @ITkFunde  2 года назад

      Thanks Ajay ☺️❤️

  • @subhradeepshah472
    @subhradeepshah472 2 года назад +4

    ❤️extreme top right hand corner of the whiteboard. ❤️

    • @sabrinafung3155
      @sabrinafung3155 2 года назад +1

      As a foreigner, I am curious what is that means in the top right hand corner of the whiteboard, is it motivation dialogue?

    • @ITkFunde
      @ITkFunde  2 года назад

      Thanks Subhradeep🙏☺️

    • @ITkFunde
      @ITkFunde  2 года назад +5

      🙏|| ॐ गं गणपतये नमः || || Om Gan Ganpataye Namah|| 🙏
      Hi Sabrina, Lord Ganesha in Hinduism is called god of wisdom and knowledge and its believed that any good work should start by taking his name first hence this mantra in sanskrit is a prayer to him to seek his blessings for all of us before we start our journey towards knowledge and wisdom. According to individual faiths he could be Jesus, Allah, Waheguru
      For us...
      He is Ganesha 🙏

  • @jagss3472
    @jagss3472 Год назад

    Lovely explanation and very insight details.

    • @ITkFunde
      @ITkFunde  Год назад

      Glad it was helpful Jaga!

  • @masterh6868
    @masterh6868 2 года назад +2

    hey your video as usual full for information and with crystal clear concepts of understanding.. Thanks posting such useful video as industries trends..
    can you make video data pipeline , which does not fall DAG pattern ...... like Ml pipeline maybe..(not sure)

    • @ITkFunde
      @ITkFunde  2 года назад

      Thanks buddy for your feedback and suggestion ☺️

  • @augugninfin1034
    @augugninfin1034 Год назад +1

    Thank You!

  • @big_pants0493
    @big_pants0493 2 года назад +6

    This is amazing. IT k Funde, Can you please suggest any Books that will explain the following topics further and also provide some training?

    • @bluzane
      @bluzane 2 года назад +3

      You don't need any....this Genius Guy is the book. He is my Guru for life

    • @ITkFunde
      @ITkFunde  2 года назад

      thanks dear 🙏🙏❤❤

  • @arundhutinayak8221
    @arundhutinayak8221 2 года назад

    Now I can put technical terms to my current task. Can you do something on API

  • @arond.g1120
    @arond.g1120 Год назад +1

    Feel like I am learning in my own language. ❤❤❤

    • @ITkFunde
      @ITkFunde  Год назад

      Thanks Aron ♥️♥️🙏

  • @brookster7772
    @brookster7772 9 месяцев назад

    Great Video! Can you tell where a Vector Database fits into this model? Isnt it at some point all data must be converted to Embeddings / vectors to be stored into a massive Vecotr Store to be used for AI similarity searches?

  • @arijitsinha2955
    @arijitsinha2955 Год назад

    Can you make video on data bricks along with an example please ?

  • @VlasTrunov
    @VlasTrunov 2 года назад +1

    It's good you focus on DAG's. But for those new to the subject it might be too abstract, I guess. What I would do is I would show how things flow in Airflow, for example, for those who perceive information visually. This way you would spread the (butter on the bread) information in your video uniformly, makking people get the grasp of the information in one pass, if you know what I mean. It's just a suggestion. But to me personally, the detailization you give is perfect.

    • @ITkFunde
      @ITkFunde  2 года назад +1

      Thanks Vlas such useful feedbacks helps me better my content. I will defintely take your thaughts and do something better nxt time.😊

  • @upendrakumar-ok3tr
    @upendrakumar-ok3tr Год назад

    Can you please make a video on Baremetal and Hypervisor

  • @deepsy4786
    @deepsy4786 2 года назад

    I would like to discuss on considering CDC as a data pipeline design pattern. My understanding would be that CDC is more related to data modelling concept. You would have to build an ELT or ETL pipeline anyways. CDC more relates to Load or Transformation technic instead of being an individual pipeline.
    However, all the insights shared were helpful and did helped me relate my work with some of these concepts.

    • @ambarishhazarnis9531
      @ambarishhazarnis9531 2 года назад

      Here CDC referred to storing the delta on a separate table. This way we don’t need to do a read on source table again to extract the change.

  • @TheyCalledMeT
    @TheyCalledMeT Год назад

    would you put data- cleansing / preparation as part of the t of EtLT pattern? or in the T?

  • @debrajpradhan5500
    @debrajpradhan5500 2 года назад

    Sir, I am interested on AWS analytics.so can u plz tell what AWS data services read 1st,2nd?

  • @anilmantri2139
    @anilmantri2139 5 месяцев назад

    Hi, how to pull the source data into EL DAG in the CDC pattern. I mean what tech stack to be used?

  • @SamS-oi5pz
    @SamS-oi5pz 2 года назад

    Hi how do we identify changed data from source?

  • @lcsxwtian
    @lcsxwtian 2 года назад +1

    At CDC, you had said that max() would get the latest snapshot of the data. I am assuming max() would get the maximum count of the data - correct? If that were the case, what if the last change was to DELETE some data, then I don't think max() would be right?

    • @AnishBhola
      @AnishBhola 8 месяцев назад

      yes your right! timestamp based cdc is generally not a good option to process deletes. There are other types of cdc such as log based (most optimal) which you can use for such situations. This video primarily talks about implementing difference based cdc (where 2 snapshots of target systems are compared).

  • @adamjapal7370
    @adamjapal7370 2 года назад

    do you have a reference or pdf book file of the data pipeline concept? if you do, could you take me to the link? thank you.

  • @imnischaygowda
    @imnischaygowda 8 месяцев назад

    What is purpose of Sink ? Can t we store data directly to DataWarehouse ?

  • @pm4306
    @pm4306 Год назад

    please give some concrete business example instead of 'n' and 'n+1' as as example will help to clarify and walk thru in a better way - i think you should give concrete real-life business examples for all cases that u discuss......you are missing actual business examples in your videos.

  • @veeek8
    @veeek8 2 года назад

    'Hope you're not bored', never 😁

  • @antonfernando8409
    @antonfernando8409 2 года назад +1

    Never heard of most of the terms like (ETL, ELT, CDC) mentioned, I guess these are specific to cloud computing, still in terms of data pipeline, its useful to learn I think. Thanks

    • @egor.okhterov
      @egor.okhterov 2 года назад

      No, it’s not about cloud computing. It’s about data analytics in general.
      When you want to build web dashboards that draw graphs of some business processes or want to analyse customer behavior, you build this data pipeline.
      TLDR: you cannot run SQL on your logs. You need to push your logs into MySQL in order to be able to query your logs.

    • @ITkFunde
      @ITkFunde  2 года назад

      Hi Anton these terms are quite old but have become more prominent with new age data management. May be you are not from Data, Business Inteligence background, but its good to learn these

  • @ernesto8738
    @ernesto8738 2 года назад

    and here I am with a cyclic graph problem {{{(>_

  • @dylanmccullough2679
    @dylanmccullough2679 2 года назад +1

    Question regarding the ELT pattern.
    You said that you should use SQL at the (T)transformation part.
    Could you use spark instead of SQL at this point? For example - Data Factory Data Flows, instead of putting compute pressure on the EDW with SQL queries?

    • @chandrakanthotkar7262
      @chandrakanthotkar7262 2 года назад

      Whenever we say ELT basically we do transformation after data has been landed in DWH or Database. Like Bigquery (GCP). As Spark engine is basically used for transformation during flow.

  • @nikhilgurram6569
    @nikhilgurram6569 Год назад +1

    Thanks!