Modern Data Engineering Workflows, Explained

Поделиться
HTML-код
  • Опубликовано: 27 июл 2024
  • Modern data engineering isn't all about tools & technologies.
    One area that's often overlooked is the concept of "workflows".
    In particular, data team workflows for continuously building projects.
    This includes everything from environments, naming conventions, automation and more.
    In this video, you will:
    - Learn high level design of common team workflows
    - See an example implementation
    - Be able to identify whether or not you're following this yourself
    Thank you for watching!
    ►► The Starter Guide for The Modern Data Stack (Free PDF)
    Simplify the “modern” data stack + better understand common tools & components → bit.ly/starter-mds
    Timestamps:
    0:00 Intro
    0:21 Why It's Important
    1:33 Design & Process Review
    4:39 Database Example (Snowflake)
    Title & Tags:
    Modern Data Engineering Workflows, Explained
    #kahandatasolutions #dataengineering #datapipeline

Комментарии • 12

  • @KahanDataSolutions
    @KahanDataSolutions  8 месяцев назад +2

    ►► The Starter Guide for Modern Data → bit.ly/starter-mds
    Simplify “modern” architectures + better understand common tools & components

  • @jacobukokobili6457
    @jacobukokobili6457 8 месяцев назад +1

    Thanks for this Kahan. Please make a video implementing the workflow like you've done with the CI/CD. Thanks again.

  • @dataruncoach
    @dataruncoach 8 месяцев назад +1

    Very clear and concise, thank you

  • @marcosoliveira8731
    @marcosoliveira8731 8 месяцев назад

    A lot of good ideas from your videos has inspired me to improve my development flow.

  • @goosetaculous
    @goosetaculous 8 месяцев назад

    I love it. Already doing but it's a good reminder

  • @felipecondore4173
    @felipecondore4173 8 месяцев назад

    Its a very clear explanation

  • @vishal_uk
    @vishal_uk Месяц назад

    Hi Mike! Could you please clarify the following:
    After the developer makes some changes in the model and raises a PR so that his changes are reviewed/auto-tested in the QA/CI DB/Schema, and later merged to the Main branch, Is the QA/CI a replica of Prod DB(warehouse and marts) where it reads data from Staging and validates the changes prior getting merged to main? Thanks in advance!

  • @NicoWright-ly6en
    @NicoWright-ly6en 3 месяца назад +1

    Hi Kahan, a question I have after watching many of your videos. What about a client's situation makes you think one tool would fit better than another? For example Snowflake vs BigQuery.

  • @MrUbbers
    @MrUbbers 8 месяцев назад

    In our setup we have multiple environments (DEV, QA, PROD), all seperate including the raw sources including the ETL. This doubles our costs at least. The setup that you showed eliminates the extra costs for processing and storage by using one environment, right? How do you deal with upgrades and changes in the raw datasource layer? For example a source system that has significant changes in its database schema after an upgrade? Just add another schema in the raw database?

  • @EMBrown801
    @EMBrown801 8 месяцев назад

    Would you need separate dev schemas for the staging and marts? Let's say I want to develop a new mart. Would I put all of those models in the same dev schema before going to production?

    • @KahanDataSolutions
      @KahanDataSolutions  8 месяцев назад +1

      I typically will do that. I like to keep all tables/views in a single Dev schema (ex. all Staging, Warehouse, Marts) to avoid excessive objects and keep it simple. The way I see it, nobody else is really looking at that schema so perfect separation & organization isn't as important. What's more important is that you can confirm models deploy, check the data, etc. Then once you move to "production", separate things out by specific schemas. Hope that helps!