Extracting Data From APIs As Data Engineers - The Basics And Challenges You'll Run Into

Поделиться
HTML-код
  • Опубликовано: 7 янв 2025
  • If you've had to build any data pipelines for analytics, then you're likely very familiar with the extract phase of an ELT or ETL.
    As the name suggests the extract phase is when you connect to a data source and "extract" data from it. The most common data sources you'll be interacting with being databases, APIs, and file servers(via FTP or SFTP).
    With my recent focus on going back to the basics, it occurred to me that I have never written about APIs and how we interact with them as data engineers.
    Now, there are plenty of APIs that have caused me a lot of heartburn in my career and there are others that have been a piece of cake to handle.
    But it all comes down to how the API is set up and the design choices made when it was built.
    If you're looking for an out of the box solution to handle your API data extraction. You can check out the two below:
    Portable For APIs - portable.io/
    Estuary For Real Time Data Extraction - bit.ly/4eQC3oQ
    Disclosure - I have a financial stake in both
    Also, if you'd like to dive deeper into data strategy and infrastructure and you'd like to support me, you can consider becoming a paid member of my Substack. I have over 100 articles that cover everything from data engineering 101 to leading data teams. Sign up with the link below and get 30% off. - seattledataguy...
    If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
    seattledataguy...
    Or check out my blog
    www.theseattle...
    And if you want to support the channel, then you can become a paid member of my newsletter
    seattledataguy...
    Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
    _____________________________________________________________
    Subscribe: / @seattledataguy
    _____________________________________________________________
    About me:
    I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
    *I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

Комментарии •

  • @SeattleDataGuy
    @SeattleDataGuy  3 месяца назад +1

    If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k

  • @nicky_rads
    @nicky_rads 3 месяца назад +7

    Solid overview. Working with APIs is a good skill set to have for data engineering. Turning JSON formatted data into tabular data for humans to understand is very important!

  • @lafcadiothelion
    @lafcadiothelion 3 месяца назад +7

    Enjoying these back to the basics videos! Perfect timing for me too

    • @SeattleDataGuy
      @SeattleDataGuy  3 месяца назад

      Yeah, it's been going back through things I take for granted now.

  • @rguez2332
    @rguez2332 Месяц назад +1

    You said there'll be an Extract from Database. I'm still waiting for learning that part 😊

  • @facundoDP11
    @facundoDP11 Месяц назад

    incredible. i am starting to work with that kind of api you name at the end and it makes me laugh how you describe all the problems i am facing right now. "but whe need the data so that's what we did" 😅 excellent. subscribinnng

  • @SegueGreene
    @SegueGreene 3 месяца назад

    Love this, perfect level of detail for where i'm at

  • @nikunj204
    @nikunj204 17 дней назад +1

    Hello. Love your content. Can you please upload those in higher quality ? like 1080p or more ? When seen on bigger screens it just blows up. Thanks

    • @SeattleDataGuy
      @SeattleDataGuy  6 часов назад

      I didn't realize that for some realize during edits these weren't staying 1080.

  • @peterndiforchu2622
    @peterndiforchu2622 3 месяца назад

    Great overview! Thanks for sharing!

  • @marshallyale3902
    @marshallyale3902 3 месяца назад +1

    You briefly discussed it, but could you talk a little bit more about the config file you discuss for parsing? I just know my code starts to become a little verbose when I have 30 different functions to parse different API calls, especially if there's additional checks that an http status code can't tell you

  • @rickr937
    @rickr937 3 месяца назад

    can you make a video about how to deal with schemas of APIs and how they change over time :) dates and temporal data in particular

  • @A_View_From_The_Shire
    @A_View_From_The_Shire 2 месяца назад

    At my company, we aim to be build infrastructure that's reuseable - why do something 10 times when you can do it once right? For example, we have an agnostic flat file loader that's quiite robust. I'm currently trying to build something similar for APIs, however it's proving quite tricky due to the nature of JSON and semi-structured. So far, I setup a recursive function that turns the nested JSON into one wide table, but it's difficult to then get that into a SQL environment if the table is too wide or to explode out some of the columns in Python.
    I'm now attempting a different method. I have a function that analyses the deep of the nested JSON, how many levels there are, the deepest path etc. I then want to use the metadata to parse it using json_normaliz, but also tricky due to the nature of JSON.
    Am I wasting my time trying to make something too dynamic for API extraction?

  • @shreyaslahoti7542
    @shreyaslahoti7542 2 месяца назад +1

    great explanation.

  • @Matias-eh2pn
    @Matias-eh2pn 17 дней назад

    Would you say that it's important for DE to know how to build APIs or it's just enough knowing how to use them.

  • @darkzero2221
    @darkzero2221 3 месяца назад +2

    Thanks!!! For this video!

  • @ringostarkiller7097
    @ringostarkiller7097 3 месяца назад +1

    nice overview! thx u

  • @artyomashigov
    @artyomashigov 3 месяца назад +2

    Thanks for the video. The quality though is 720p.

    • @777E-m1g
      @777E-m1g 3 месяца назад +3

      I agree with u. a higher resolution is needed

    • @SeattleDataGuy
      @SeattleDataGuy  3 месяца назад +1

      Thanks for the callout, something happened with editing, I was actually confused by this because i know i film HD, but I didn't realize the resolution was impacted.

  • @bwb9479
    @bwb9479 3 месяца назад +2

    yea but its really bored man
    everyone knows that so superficial
    when making this with chatgpt at least think like how can i be useful to people rather doing spam like videos, at least add paginated calls, airflow http operators , things like that man. i am really bored .