Building Your Own Data Pipeline Tool From Scratch - Should You Do It?

Поделиться
HTML-код
  • Опубликовано: 23 янв 2025

Комментарии • 20

  • @tomastruchly9484
    @tomastruchly9484 2 месяца назад +9

    Usually the answer is NO. Ad 1 it is almost always way more difficult that it initially look. Ad 2 you need to teach it new colleagues as they have 0 zero chance to work with it before. Ad 3 by default tech experts want to use standard tools as it is way more perspective for their career if they decide to switch the company

  • @MarcLamberti
    @MarcLamberti 2 месяца назад +1

    Don't you dare cross Airflow ever again on a thumbnail 😱
    No I'm kidding. Great content as always. ❤❤

    • @SeattleDataGuy
      @SeattleDataGuy  2 месяца назад

      It got your attention! Thanks I try to dig into what I see in the real world.

  • @longdatadevlog
    @longdatadevlog 2 месяца назад

    Hi Ben, I would say NO for building something if the it has open-sources and community to support and we can contribute to open-sources as well.
    BUT, YES is for learning purposes if we want to deep dice into how it actually works and simplify the process. Long story short, it is good for learning purposes.

  • @A_View_From_The_Shire
    @A_View_From_The_Shire 2 месяца назад +1

    I’m facing this difficulty at the moment. I work at a medium sized company, so we don’t have these large enterprise tools, so I’m trying to build something to handle APIs. However, I’m finding it tricky because of the changing structure of JSONs. Is there a tool out there I could use to make this easier? My boss doesn’t like us repeating ourselves, which it’s correct to a degree, but I think there’s a point where you can’t have a one-size fits all solution.

    • @SeattleDataGuy
      @SeattleDataGuy  2 месяца назад

      It can be difficult to one size fit all APIs, I actually just did a video talking about the various considerations made when working with APIs, Authentication, Pagination, data formats(JSON, XML, CSVs), etc. I linked the video below. But unless you have very consistent APIs you're working with it because a jumble of if else statements pretty quickly.
      There are some open source options if thats' what you're looking for as well as paid solutions. What tooling are you currently using?
      Link to API Video - ruclips.net/video/YST1sWFPDh4/видео.html

  • @zesky6654
    @zesky6654 2 месяца назад +2

    My last employer went bankrupt trying to build their own data solution for a use case that most off-the-shelf ETL Orchestrators could have easily done.

    • @zesky6654
      @zesky6654 2 месяца назад +1

      Most companies that go this way overestimate how complex their data needs are.

    • @SeattleDataGuy
      @SeattleDataGuy  2 месяца назад

      that's a pretty intense outcome...how did they not spot the giant money pit...

  • @0kazaki
    @0kazaki 2 месяца назад +1

    no

    • @SeattleDataGuy
      @SeattleDataGuy  2 месяца назад +1

      and yet I keep seeing people try

    • @0kazaki
      @0kazaki 2 месяца назад +1

      @@SeattleDataGuy maybe as a learning experience yeah, but it's unmaintainable

  • @pedrolamarao9803
    @pedrolamarao9803 2 месяца назад +1

    No

    • @SeattleDataGuy
      @SeattleDataGuy  2 месяца назад

      exactly, short and to the point

    • @pedrolamarao9803
      @pedrolamarao9803 2 месяца назад +1

      @@SeattleDataGuy True, let me elaborate =). I got your point on the video but by working on a large company, there is no time to even consider building a new tool like Airflow. There is already a pool of tools that apply to most use cases and we focus on solving problems where there are not tools yet.
      Of course there are other professionals that might feel the need for another tool, and have the time to invest.

    • @SeattleDataGuy
      @SeattleDataGuy  2 месяца назад +1

      @@pedrolamarao9803 Oh, that wasn't a complaint! In all fairness this video could have been 3 seconds 🤣but also thank you for sharing your thoughts!!!

  • @billybones7613
    @billybones7613 2 месяца назад +1

    no

    • @SeattleDataGuy
      @SeattleDataGuy  2 месяца назад

      short and to the point!

    • @billybones7613
      @billybones7613 2 месяца назад

      @@SeattleDataGuy once client told us to make em spark but of aws lambdas, we did, id was bad slow and expensive, hard to use and invoked thousands of lambdas at a time, cause he sold it to his boss as improvement, same with airflow it os industry standart it does the job, everyone knows it suitable for most jobs, so God please NO, NOOOO