Building Data Pipelines Part 1: Airbnb's Airflow Vs Spotify's Luigi

Поделиться
HTML-код
  • Опубликовано: 2 ноя 2024

Комментарии • 38

  • @singhsandeep
    @singhsandeep 3 года назад +11

    Thanks man, every other video talks about ETL and gives the same old theory again without actually discussing and showing the actual tools and processes.

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +3

      Thank you for the comment! My goal is to start doing some more tools discussions. First because there are so many options so I think there is a lot of value in helping people understand the differences. Second, I am trying to position my consulting company into a strong position as a solutions architect + data engineering consulting company.

  • @hahah0811
    @hahah0811 3 года назад +13

    This is amazing! Thank you for explaining things in a clear and concise manner. I'm hooked and look forward to more videos in this series.

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +5

      Awesome! I am glad to hear that. I really want to focus on building how-tos for data engineering and data infra.

  • @kiranmudradi26
    @kiranmudradi26 3 года назад +6

    Seattle Data Guy, Your are doing great work in Data Engineering world. Your videos are clear and helping me a lot in DE field. Thank you and appreciate your work.

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +3

      Thank you so much for your kind words. I will keep it up. Good luck on your data engineering journey!

  • @fs_72
    @fs_72 3 года назад +8

    Great video man, I'm only recently trying to gain a foothold in data engineering, but your Videos are always a great help. I hope you don't get discouraged by the smaller popularity of DE videos compared to DS and keep it up like this! :)

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +3

      Thank you! I am going to keep it up. I honestly have a bigger vision. I want to start doing a lot of videos on DE but also on tooling like Databricks, Fivetran, and so on. I think eventually it will start to click or find the right audience.

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +2

      Also, feel free to share and help spread the cool factor of DE!

    • @manu-gt9gr
      @manu-gt9gr 2 года назад

      how is it going bro? did you get the any job?

    • @fs_72
      @fs_72 2 года назад +1

      @@manu-gt9gr I did, yes! I'm currently working as a Data Engineer, though I will try to transition to SWE next year. :)

    • @mufaromatura6959
      @mufaromatura6959 Год назад

      @@fs_72 any reason why you’re leaving/left DE? And how many of the various DE tools did you have to master before you landed a DE job?

  • @StartDataLate
    @StartDataLate 11 месяцев назад +1

    Thank you so much for explaining. Before this i didnt know luigi existed...
    My org just moved from crontab to airflow

    • @SeattleDataGuy
      @SeattleDataGuy  9 месяцев назад

      At this point I imagine you should focus on Airflow because the company supporting luigi has also switched away.

  • @joeeeee8738
    @joeeeee8738 3 года назад +8

    Nice one! Would love to see advantages/disadvantages, os and how to deploy it. Also other tools! (Databricks, dagster)

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +2

      Thanks for the comment! That is the goal. I am going to start working on trying to breakdown some of these tools. At the very least from an intro level. But hopefully, eventually, more in depth. There are just so many data engineering tools!

  • @joeycarson5510
    @joeycarson5510 9 месяцев назад

    Wondering your thoughts on a modular data science framework called Kedro. It seems young and still growing, but I have had some good experience with it as a means for writing modular code where modules are wired together into a DAG such that modules have no knowledge of predecessors or successors and are essentially just a function pipeline.

  • @Chiefnice22
    @Chiefnice22 3 года назад +5

    Awesome video Data Guy! Always learn something new watching your channel. I was wondering, it might have been mentioned in a previous video- where you talk about a possible reason for data engineers being paid slightly less at the big companies but still having a higher average salary. I think you said one of those reasons was because the tech giants already have their data infrastructure in place, and building that infrastructure might be some of the more valuable work that a data engineer does.
    Would this type of work consist of like choosing what tools/stack to use for the data engineering of a company? And would building etls/pipelines not fall under that? Or what type of work were you thinking of when you said that the large companies often already have their data infrastructure in place, which could be the more valuable work sometimes?

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +1

      Great question. It's interesting that Glassdoor and other sites sometimes have higher salaries for data engineers. In my experience, generally speaking data engineers tend to make less than software engineers(usually because of stock). Often the base is close or similar. As far as DE work being valuable. I think many companies, in particular tech companies, can treat DE work more like plumbing. It has to be done, but they don't see the value.
      However, if you are a SWE working on a product that millions of end-users access it can seem more valuable vs data pipelines that are likely used by only internal users for analytics.
      It's really hard to say. Again it all does depend on the company and how much the DE position is more of an analytical role vs a software role.

    • @Chiefnice22
      @Chiefnice22 3 года назад +2

      @@SeattleDataGuy Hey Data Guy, thank you for your response! Yes that makes sense and it is definitely an interesting concept. And yes I have seen some people classify data engineers into two different types- the more analytical role and the more “subset of software engineering” type role and those distinctions seem relevant as well.
      I also think it is interesting that there seems to be a phenomenon where globally/nationally, data engineers have a higher average salary than software engineers (can be a quite a deal higher depending on which source you look at), but that at a certain subset of companies they are actually paid a little less than their software engineering counterparts. So maybe there is a phenomenon where at, (let’s say for sake of illustration) 95% of companies- data engineers are better compensated, but at 5% of companies, which include some or all Faang companies, data engineers are compensated less than software engineers.
      www.roberthalf.com/sites/default/files/documents_not_indexed/2020_Salary_Guide_Technology_NA.pdf
      www.talent.com/salary?job=data+engineer
      www.talent.com/salary?job=software+engineer
      insights.stackoverflow.com/survey/2020#technology-what-languages-are-associated-with-the-highest-salaries-worldwide-united-states
      Those are just some sources saying mostly the same thing (I have seen these before and just pulled them back up), that the average compensation for data engineers tends to be a good deal higher than software engineers. On the first link, you can look at “Big Data Engineer” (aka data engineer) compared to “Software Engineer”(you can control + F them or something). It has “Software Engineer” at 105 and “Big Data Engineer” at 130 at the 25th percentile of salary, and the difference sustains through higher percentiles of salary. And on the stack overflow developer survey, Scala in the United States was the highest paid programming language above all other languages, and Scala is almost exclusively a data engineering language in enterprise environments. And also I think the high growing demand for data engineers, which I believe you have also covered in the past, seems to be in line with these observations as well.
      As for this phenomenon where some large companies may compensate data engineers less, but the vast majority of companies compensate them more, I am not sure what the cause of that would be but I think it is pretty interesting. Maybe something to do with data engineers at some of these larger tech companies tending to be more towards the “analytical” type of data engineer, as opposed to the “subset of software engineering”/ specialized software engineer type of data engineer? Maybe those latter types of engineers at those companies have more spillover into just being labelled as normal software engineers? But I am just spitballing here I really don’t know.
      But yeah and just to clarify I don’t really think the monetary differences that important, but I think this concept is interesting because it can be insightful for where the industry is heading, and for how things work as a whole. Also sorry for the long comment! Anyways, great channel, I am certainly glad it exists and that you put out these videos!

    • @Chiefnice22
      @Chiefnice22 3 года назад +1

      @@SeattleDataGuy Also, this is kind of related but a little different. I was also curious if you know about whether those differences in compensation at those larger companies that you know of persist through the higher levels? Like for the Staff engineer level etc. would that comparison between data engineers and software engineers remain roughly the same? Thanks!!

  • @metaller_alex
    @metaller_alex 3 года назад +7

    Airflow looks much more clean then Luigi.

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад +5

      Personally that is how I feel. I feel like Airflow was built from a more data engineering perspective. I think setting up configuration style code vs. classes just makes sense.

  • @SumitKumar-pg4fs
    @SumitKumar-pg4fs 3 года назад +1

    I really liked this video I am in my first year of my data engineering job.Would request you to make more videos like this.

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад

      Some days I wan tot just quite my job just to make these videos. I really want to keep making this series.

  • @melvink.2681
    @melvink.2681 3 года назад

    Thank you, amazing good explaining : ) Can you please let me know if it would be able to have a "Profile Picture" in the left navigation using LUIGI Framework. Or is it strict to only navigation elements? (Vision is having a Picture, than the navigation list below).

  • @midhunrajaramanatha5311
    @midhunrajaramanatha5311 3 года назад +1

    Superb Video

  • @Avico78
    @Avico78 3 года назад +2

    Great tutorial, is there part 2?

    • @SeattleDataGuy
      @SeattleDataGuy  3 года назад

      I did create a video about Luigi as a part 2...but it didn't do so well ruclips.net/video/vtZba4pnGuQ/видео.html

  • @0yustas0
    @0yustas0 2 года назад

    Ok... I'm old style guy. When I hear DAG, my first question is "can this tool make loops between DAG nodes"? :) Also ETL\ELT... Are you sure that it has real possibilities? It's just good scheduling tool, that very often provides possibilities to overengineer problem.

  • @naheliegend5222
    @naheliegend5222 2 года назад

    What do you think about SSIS?

  • @EV4UTube
    @EV4UTube Год назад

    Question. I see this phenomenon all the time and I'm always so surprised by it. If you go to timestamp 12:16, you'll see what I'm talking about. Coders on RUclips walking thru code, but the font size of the code is tiny (like ~2pt). Surely the code is some of the most important part, right? Like, why would you not choose to choke-up on that a little; zoom-in so we could see it easily. Maybe just show sections of code at a time - at least it would be legible. I mean, you've dedicated time to talk about it, so it must be important. If it is important, why not make it visible / legible? Someone will have to explain this to me one day. I think it is some sort of secret professional convention among RUclips coders to only show code in miniscule font sizes. Hey, I wonder... If I start making RUclips videos with the code too small to see would that certify me as a professional coder? Asking for a friend.

  • @lucaguarro6116
    @lucaguarro6116 2 года назад +1

    awesome video and explanation, thank you
    Also appreciated that you didn't tell us to like comment and subscribe (which I did anyways btw)

    • @SeattleDataGuy
      @SeattleDataGuy  2 года назад

      I am glad you enjoyed the content! I do that sometimes. It does actually impact the response of people subscribing. It's like making click-bait-y titles. In some ways you hate it...but if you want to stand-out you have to. The struggle of being a creator.

  • @maybenew7293
    @maybenew7293 2 года назад +1

    are you in a hurry?

  • @5thbatman
    @5thbatman Год назад

    Please check out apache-beam as well, which I found very much helpful than these tools for data pipelines but it would be great if you can do some videos on it...