What is Data Pipeline? | Why Is It So Popular?

Поделиться
HTML-код
  • Опубликовано: 20 ноя 2024

Комментарии • 100

  • @ryanbent9368
    @ryanbent9368 Месяц назад +6

    I work as the PM in data enablement, this video was amazing for understanding each component in a data pipeline.

  • @petenjs3500
    @petenjs3500 5 месяцев назад +78

    3:13 typo *AWS Glue.
    Love these vids, thanks!

    • @Nonenone-rj9yp
      @Nonenone-rj9yp 5 месяцев назад +3

      bruh had me googling whats AWS glow

  • @livestriming289
    @livestriming289 Месяц назад +2

    00:01 Data pipelines automate data collection, transformation, and delivery.
    00:38 Data pipeline involves stages like collect, ingest, store, compute, and consume.
    01:18 Data pipeline captures live data feeds for real-time tracking.
    01:56 Data pipeline involves batch and stream processing of ingested data
    02:40 Data pipeline tools like Apache Flink and Google Cloud are used for real-time processing of data streams.
    03:23 Data is transformed for analysis in storage phase
    04:05 Data pipelines enable various end users to leverage data for predictive modeling and business intelligence tools.
    04:47 Data pipeline enables continuous learning and improvement using machine learning models.
    Crafted by Merlin AI.

  • @TrusePkay
    @TrusePkay Месяц назад +1

    The best animated introduction to data pipelines in just five minutes.

  • @Iamine1981
    @Iamine1981 Месяц назад

    Great video. Showed me the fundamentals of data pipelines and processes from collection to consumption. There are so many tools/applications extensively used for data processing at various stages that I have never heard of, or only encounter in job descriptions, but since I am not a data specialist, I had no idea of! Thanks for putting these short summaries online. Helpful for people like myself!

  • @ChrisPatt
    @ChrisPatt 4 месяца назад +1

    Loved how simply you explained this complicated concept! Also what are your thoughts on Irys, world's only provenance layer ensuring the data integrity and accountability.

  • @atomtamadas
    @atomtamadas 5 месяцев назад +4

    Spark is widely used in stream processing too, not only batch, see spark structured streaming.

    • @patrickm.4754
      @patrickm.4754 4 месяца назад +2

      For stream processing, Apache Flink is more suited. Even though both can do stream and batch processing.

  • @mrseanpaul81
    @mrseanpaul81 5 месяцев назад +4

    I love the short video format, as I can dive deeper on topics and terms I am interested in on my own time :)

  • @prasenjeetrathore
    @prasenjeetrathore 5 месяцев назад +4

    Amazing explanation, so far the most easy to digest video about data pipelines.

  • @rohithreddy75
    @rohithreddy75 29 дней назад

    Your channel is a blessing.

  • @SudhanvaDixit
    @SudhanvaDixit 5 месяцев назад +51

    0:49 Shouldn't the last one be 'Consume'?

    • @TrusePkay
      @TrusePkay Месяц назад

      Yeah...error but you can the video has to be published. They cannot go back to edit from the beginning

  • @EricaErica-ey2nb
    @EricaErica-ey2nb 8 дней назад

    Love the presentation, Do you recommend some resource to do it?

  • @lumielz1495
    @lumielz1495 18 дней назад

    This video was amazing for understanding, thank you 🤗🤗

  • @uplifting_sounds
    @uplifting_sounds 5 месяцев назад +2

    I like your presentations. What do you use to make them?

    • @chrisalmighty
      @chrisalmighty 4 месяца назад

      I also want to know what he uses to create the presentation illustrations. They look neat

    • @LUISITOTHETROLL
      @LUISITOTHETROLL 2 месяца назад

      @@chrisalmighty Adobe illustrator and after effects

  • @vlplbl85
    @vlplbl85 5 месяцев назад

    Great video. Small remark: the AWS service for ETL is called AWS Glue, not Glow

  • @aamirsuleman9815
    @aamirsuleman9815 4 месяца назад

    I think you meant AWS Glue 3:18. Appreciate these informative videos

  • @chrisalmighty
    @chrisalmighty 4 месяца назад

    Video illustrations look neat. What tool did you use create the presentation illustrations?

  • @gus473
    @gus473 5 месяцев назад +3

    💯 Looking like your channel is on track for 1 million subscribers by year end! Great stuff! 😎✌️

  • @heangsok862
    @heangsok862 Месяц назад +1

    Suppose we have 100 microservices deployed as different AWS Lambda functions. Out of these, more than 30 Lambda functions need to write data to MongoDB Atlas. Each of these 30 functions is triggered simultaneously via SNS (Simple Notification Service), and each function will be invoked 200,000 times to process different data.
    Given this setup, the MongoDB Atlas connection limit will likely be exhausted due to the large number of simultaneous requests.
    What would be the best approach to handle this scenario without running into connection problems with MongoDB Atlas? May you create a video for this scenario, sir?

  • @chrisc9725
    @chrisc9725 4 месяца назад

    Fantastic video and graphics, what program do you use to animate your graphics? It's great stuff.

  • @mdhalima5682
    @mdhalima5682 Месяц назад

    tq very much .mind blowing explantion

  • @bladethirst1
    @bladethirst1 5 месяцев назад

    Maybe some examples of simplified pipeline on specific application would make this video even better.

  • @ttehir
    @ttehir 5 месяцев назад +51

    Why do we mostly talk about data pipelines for BI or ML when many times we also need it for functional applications?

    • @personalbranddata
      @personalbranddata 5 месяцев назад +3

      Those functional applications should likely use the same data platform for their functional applications, the only difference is how you're serving the transformed result. What's the difference then that you think should be talked about?

    • @manishshaw1002
      @manishshaw1002 5 месяцев назад +16

      Functional applications are most likely consume very small amount of data while BI and AI ML models required way more likely gb to TB amount of data to work with.
      There's no possible way you can load 1gb of data in your web app or sql it just makes your app clogging and time consuming.

    • @JB-ve8sk
      @JB-ve8sk 5 месяцев назад

      Because more and more non-traditionally technical business roles are leveraging data for business intelligence - so the demand for understanding these concepts is greater there (than in complex application architectures where more traditional technical skill accumulates).

    • @deadohiosky1701
      @deadohiosky1701 5 месяцев назад

      Just call it messaging and you’re good to go

    • @JustinLietz
      @JustinLietz 4 месяца назад

      @@manishshaw1002this isn’t always true, at the health insurance company I work at we have functional applications that internal users and providers use to view data about members and there are vast amounts of data streaming to and from these applications

  • @thetatso9462
    @thetatso9462 Месяц назад

    thanks for the knowledge you share

  • @AdityaTyagi-c1m
    @AdityaTyagi-c1m 2 месяца назад

    I want to learn system design for data pipelines
    Could you please suggest how to proceed ? What books ?

  • @zobaidulkaziex
    @zobaidulkaziex 5 месяцев назад +5

    Very good discussion

  • @sreenivasreddypallerla9941
    @sreenivasreddypallerla9941 5 месяцев назад

    Very informative !! But how you do all these animations ??what product do you use !!

  • @mwanthidaniel1254
    @mwanthidaniel1254 5 месяцев назад +8

    Which tool do you use to create these animated presentations?

    • @jay51200
      @jay51200 5 месяцев назад

      Trade secret 😂

    • @chrisalmighty
      @chrisalmighty 4 месяца назад

      I also want to know what he uses to create the presentation illustrations. They look neat

  • @user-data_junkie
    @user-data_junkie 5 месяцев назад +12

    What do you use to create these animations/info graphics

    • @knighthawk095
      @knighthawk095 5 месяцев назад

      I think it could be either figma or canvas.

    • @user-data_junkie
      @user-data_junkie 5 месяцев назад

      @@Biostatistics is there a video out there that shows how that is done in power point? I see these data like infographics a lot these days

    • @Biostatistics
      @Biostatistics 5 месяцев назад +3

      @@user-data_junkieit’s says in the description of this video, he used Adobe illustrator and after effects. 😊

    • @user-data_junkie
      @user-data_junkie 5 месяцев назад +1

      @@Biostatistics thanks. I did check at the time and did not see anything. Appreciate the update

  • @Captplanet23
    @Captplanet23 4 месяца назад +1

    Why is Apache Flink not an option for batch processing? As I understand it, it makes more sense to use the same computation frameworks when doing both, so why not use Flink for both given Flink can support batch jobs?

  • @Paul__xa1r
    @Paul__xa1r 4 месяца назад

    Important information about refunds: what a joy

  • @kartikmahajan4405
    @kartikmahajan4405 3 месяца назад

    this was very useful. thanks for sharing.

  • @harsh5402
    @harsh5402 3 месяца назад

    one-stop shop Video . loved it ♥

  • @fabiodonascimentopatao8544
    @fabiodonascimentopatao8544 3 месяца назад

    Very Good Video!! Easy to get!

  • @immanuelt613
    @immanuelt613 5 месяцев назад +1

    Top quality work as always

  • @mePrafull
    @mePrafull Месяц назад

    Thanks!

  • @jaykukreja7125
    @jaykukreja7125 5 месяцев назад

    Love it. This jargon cleared now

  • @husseineldeeb
    @husseineldeeb 5 месяцев назад +1

    Amazing video. Thanks for your great efforts!

  • @JasonLayton
    @JasonLayton 4 месяца назад

    Intimidating!

  • @raj_kundalia
    @raj_kundalia 5 месяцев назад

    Thank you for doing this!

  • @markwallstrom9994
    @markwallstrom9994 5 месяцев назад +1

    No mention of Apache Iceberg and such technology?

  •  5 месяцев назад

    Is GA4 consider a data stream? And big query a storage and transform tools?

  • @helikopter1231
    @helikopter1231 4 дня назад

    Why would ETL here be considered as real time when ETL is slower as you need to transform every single extraction before you load it into a db warehouse?

  • @Mr.Andrew.
    @Mr.Andrew. 5 месяцев назад +2

    Your diagram had compute arrows twice when you verbally said compute and consume for the last two phases.

  • @mikedepacina8588
    @mikedepacina8588 5 месяцев назад +1

    Aws glow or aws glue?

  • @rishiraj2548
    @rishiraj2548 5 месяцев назад +2

    Thanks

  • @debajitkataki2085
    @debajitkataki2085 11 дней назад

    3:13 , what is AWS Glow ? Typo ??

  • @VikramPatilvp
    @VikramPatilvp 5 месяцев назад +1

    Looks like your examples are only AWS or Google stack. Why not cover examples from MS Azure stack as well?

  • @arvindraj8877
    @arvindraj8877 2 месяца назад

    Tomorrow i have an interview :)

  • @jordanfarr3157
    @jordanfarr3157 5 месяцев назад

    Always so so good

  • @VishnuVijayan7
    @VishnuVijayan7 5 месяцев назад

    Did not make a mention on data lakehouse

  • @yongguangli3304
    @yongguangli3304 5 месяцев назад

    请问这些精美的图是怎么画的?太赞了

  • @andreslasvegas30
    @andreslasvegas30 5 месяцев назад +1

    I dont know why but the gain of the microphone is too high, there is a little background noise and its a bit noticeable, keep it in check.
    Great video, as always in the channel.

  • @hoomanmohammadi-c6l
    @hoomanmohammadi-c6l 4 месяца назад

    you have an error in diagram, 2 computes, it should be compute and consume

  • @saratpoluri
    @saratpoluri 5 месяцев назад

    Bravo!

  • @marcgentner1322
    @marcgentner1322 5 месяцев назад

    So i need to build a way so retrieve man many emails and categorize them with a ml model and then save them in the right system. Do i build this with kafka and pyspark? Or how can this be done easaly

    • @jay51200
      @jay51200 5 месяцев назад

      Kafka dear

  • @williamchurch711
    @williamchurch711 4 месяца назад

    So basically a data pipeline is similar to a system flowchart?

  • @internetexplorer1593
    @internetexplorer1593 5 месяцев назад +1

    Leaving out all Azure tools... really a shame

    • @scottedmiston6566
      @scottedmiston6566 5 месяцев назад

      Maybe it's intentional. Many serious data scientists aren't fond of the Azure UI for big data pipelines.

    • @JB-ve8sk
      @JB-ve8sk 5 месяцев назад

      Microsoft training has that covered

  • @eddielim8888
    @eddielim8888 5 месяцев назад

    AWS Glow or Glue?

  • @slayerzerg
    @slayerzerg Месяц назад

    AWS Glue*

  • @padamatimypalyadav
    @padamatimypalyadav 28 дней назад +1

    ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

  • @dimitrikalinin3301
    @dimitrikalinin3301 5 месяцев назад

    AWS Glue, not Glow

  • @jay51200
    @jay51200 5 месяцев назад

    "Trade Secret" name of the tool used to create the animations ...😂

  • @CurmeLeila-m5w
    @CurmeLeila-m5w Месяц назад

    Lopez Robert Lee Gary Williams Christopher

  • @checkerist
    @checkerist 5 месяцев назад

    apache hive logo is on acid

  • @ONeilPoppy-l1k
    @ONeilPoppy-l1k Месяц назад

    Davis Jose Harris Christopher Jackson Ronald

  • @vickyg1877
    @vickyg1877 5 месяцев назад

    Rest api

  • @thesimplicitylifestyle
    @thesimplicitylifestyle 5 месяцев назад

    😎🤖

  • @tetratessera8825
    @tetratessera8825 2 месяца назад

    I like your content a lot but you have a lot of mistakes. Not only in this video but also in the others.
    Mislabeling, duplicities. It might get confusing a lot for a beginner. Similarly if you are using acronyms I would recommend explaining them or at least stating the full name

  • @albinantony4998
    @albinantony4998 5 месяцев назад

    looks like you need to change the mic you are currently using. there is some crackling noise when you talk.

  • @johnsmith21123
    @johnsmith21123 5 месяцев назад +3

    Hadoop is dead

    • @praveens2272
      @praveens2272 5 месяцев назад +1

      Why, what's the reason

    • @JohnS-er7jh
      @JohnS-er7jh 5 месяцев назад +5

      they said that about Mainframe computers 30 years ago, but they are still here/in production. Large organizations are not going to adopt the latest solutions for all there data needs (for instance data that isn't accessed that often/specific use cases, or they might have support staff that is more familiar with legacy tools and they don't see the need to adopt latest methods at the moment). So I can guarantee Hadoop is NOT completely dead.

    • @angryktulhu
      @angryktulhu 5 месяцев назад +3

      Lol it’s not dead at all, and its ecosystem tools are still widely used

    • @shilashm5691
      @shilashm5691 5 месяцев назад +1

      😂 most uses hdfs as data lake, when you say hadoop.is dead be precise and say mapreduce.is dead, bcoz hadoop ecosystem is large and still functioning

    • @personalbranddata
      @personalbranddata 5 месяцев назад

      @@shilashm5691 Most use AWS S3 as storage for their datalake, others Azure Data Lake Storage. MapReduce is dead and HDFS is on the brink of obscurity as well. I pity those who still have to work with some inhouse hdfs from the darkest and most painful era of data engineering (hadoop era)