Microsoft Fabric: How to Build a Lakehouse with Medallion Architecture

Поделиться
HTML-код
  • Опубликовано: 8 окт 2024
  • Microsoft Fabric is a new cloud computing solution that enables you to build a lakehouse, a unified data platform that combines the best of data lakes and data warehouses.
    But how do you design and implement a lakehouse that meets your data analytics needs?
    One of the popular approaches is the medallion architecture, where you have different layers of data quality and refinement: bronze, silver, gold, and diamond.
    I have been working with Fabric and the medallion architecture for months and will share my insights and tips on how to create a lakehouse with medallion architecture using Fabric’s services and features.
    We will also answer your questions and comments live, so don’t miss this chance to learn more about Microsoft Fabric and the medallion architecture.
    This topic is for everyone

Комментарии • 33

  • @moeeljawad5361
    @moeeljawad5361 7 месяцев назад +2

    Great Video, i was hoping you would continue the demo pushing data between the Bronze, Silver and Gold Folders, and maybe show how to set the access to one/different folders by security groups. You might have addressed these points in different videos, so i will keep looking. Thanks

  • @kel78v2
    @kel78v2 2 месяца назад

    Interesting demo. Was interested to learn about how the data sources were being reference and use between the Bronze, Silver and Gold layers.

  • @PontyclunBosomPals
    @PontyclunBosomPals Год назад +1

    I have been using the medallion architecture but with actual tables as opposed to the folders. I never considered doing it with the folders. I suppose it makes sense if the data isn't in the format to load directly into a table. You still have to find a way to update the tables rather than the right hand click and load to. I just used notebooks to carry out all of the processing. I like the idea of the diamond layer representing the data model piece.

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад +1

      Thanks! I have also been thinking about separation in Gold - modeled/ Diamond- semantic/ ????? - flattened report or api. Gotta come up with a name for that.

  • @cfosund
    @cfosund Год назад +2

    Great session Christopher. 👍🏻 Would it be possible to enable the ‘Live Chat’ functionality in the final recording? Reid (Havens Consulting) does that, it’s valuable to be able follow the discussion and question from your community for us that was not able to attend the livestream. 😊 Thanks, keep it up.

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад

      Hello @cfosund - that should be on... I can see it on this video. I know that I missed the setting on previous livestreams, but I think that this has been sorted out.

    • @cfosund
      @cfosund Год назад +1

      Awesome, it´s available now. Thanks!

  • @CodyT-i1y
    @CodyT-i1y Год назад +1

    Great Content Chris, I saw this live and then came back now and re watched it as I start to Fabric more.
    Question though regarding medallion architecture. It was set up in this demo with clear labels within your OneLake. However, once you moved into the SQL Warehouse / Endpoint the medallion naming convention wasn't used any long. I know this was a time crunch to fit everything in this demo, but wanted your thoughts on persisting the medallion structure in the SQL Warehouse side. Let me know if it was there and I missed it.

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад

      Good call out. In the Lakehouse or DW I would separate them out by schemas. Schema's were not working at the time of the recording. They work great now. Do that. :)

  • @aspirethrissur4016
    @aspirethrissur4016 Год назад

    Great Video. Thank You. In real world scenario, the Bronze Folder structure will be of type Year/Month/Day format.

  • @igorkocic112
    @igorkocic112 7 месяцев назад +1

    Thanks for the video Chris!
    I'm wondering should each medallion have it's prod dev and quality stage? or 2 of them?

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  7 месяцев назад

      Great question. Keep your environment as simple as you need. If you need to have these additional layers, then do so. Bigger companies 100% have at least this many layers. Smaller companies may be fine without them.

  • @bryanrock4836
    @bryanrock4836 7 месяцев назад +1

    Great video! One question. Is there a way to use the medallion architecture in situations where data needs to be in near-real time? I know directlake eliminates latency between the delta tables and the semantic model, but everything upstream of that (dataflows, pipelines, etc.) requires scheduled refreshes. It seems like breaking those out into layers would only increase latency. Or am I missing something?

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  7 месяцев назад +1

      Great question. For Real time data, I set up the stream to write to gold, then scripit it back to Bronze and silver on a time based cadence.

  • @mwaltercpa
    @mwaltercpa Год назад +1

    Thanks Chris, this is all great info for the growing citizen dev :)

  • @danhthanhnguyen-o1q
    @danhthanhnguyen-o1q Год назад

    Great video! Could you explain more about Diamond layer? Thanks in advance.

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад

      The Diamond layer is the semantic layer usually comprised of SSAS / AAS / Power BI Dataset layer where Measures are created, RLS is applied, and users have access to consistent joins and data.

  • @notoriousft
    @notoriousft Год назад +1

    A dollar each time you say 'it depends', very GIAC-like 😊

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад +1

      Bingo. Charity should be a priority to everyone. :) Happy to promote a kinder world.

  • @ChrisDowns88
    @ChrisDowns88 Год назад

    Hey, thank you for the video was really informative.
    Quick Q on the 'load to table' within lakehouse. I inserted new data (row) into the .csv and wanted it to show in the table automatically which didn't work. How would I get this to work automatically without having to re-create the table each time? Is that even possible? Thank you!

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад

      Great question. The CSV becomes either a base for a table that is rewritten each time, or you build processes on top of that that load it into the Delta file in the lakehouse table. You can build processes that update the Delta file that was created by the CSV, but by just updating the CSV you have to reload (or choose your management technique).

  • @justair07
    @justair07 11 месяцев назад +1

    I'm curious how you can use dataflow as mentioned to land from an on prem file server to fabric? Won't you need data factory for that?
    \

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  11 месяцев назад

      You do not
      For now:
      You have to do a Gen1 dataflow to the service
      Then a Gen2 from the Gen1 dataflow.

  • @stinaintx
    @stinaintx Год назад

    Sorry I missed the live version - was on w/ Microsoft Support regarding a Dataflow Gen2 error!

  • @Phoenixspin
    @Phoenixspin Год назад +2

    Medallion Architecture - yet another term dropped on me after Fabric has already smothered me with a dizzying array of confusing terms already. When will it all end? When can I have peace like Dax? Honestly, it's getting to be too damn much. I'm not sure any of this will help the little guy in the small organization. My frustration is real.

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад

      I am SO sorry... this is something that has been around for 6 or 7 years (?) and was originated by Databricks (I believe). The ONLY reason I bring it up and use it is to hopefully connect with other documentation you may see in other technology.
      Only use the tools that you need. Keep it simple. If you run into a problem, then think about switching.

    • @EmmanuelAguilar
      @EmmanuelAguilar Год назад

      @@ChrisWagnerDatagod , the problem is not the Medallion is how fabrics work with that!! I have a bucnh of question about fabric!!

    • @alt-enter237
      @alt-enter237 Год назад +1

      Hey Chris! Question about how fixed the boundaries between the layers? Meaning are the definitions of what the layers contain hard and fast?

    • @ChrisWagnerDatagod
      @ChrisWagnerDatagod  Год назад +1

      @@alt-enter237 Good question. While not hard and fast, it's a very rare exception that did not cause problems nearly right away.

  • @nicobotes9660
    @nicobotes9660 Год назад

    Is it Cake or is it Cake 2?🍰