Databricks + Power BI: Design Best Practices - 2023.07.12

Поделиться
HTML-код
  • Опубликовано: 8 июл 2024
  • Optimizing Databricks for your Power BI reporting: A how-to guide for the Gold Zone.
    Speaker: Hobbs / iamhobbs
    Slides: drive.google.com/file/d/1QNTQ...
    #databricks #powerbi
  • НаукаНаука

Комментарии • 11

  • @ShrikanthP-je4ki
    @ShrikanthP-je4ki Месяц назад

    This video is sufficient to make perfect design considering Power BI and Databricks !!! thanks a lot !!! I appreciate your details and crystal-clear explanations

  • @ghyootceg
    @ghyootceg 5 месяцев назад +1

    Fantastic video ! Makes me wonder what should and shouldn't be built using Databricks SQL. As per my understanding, in this video, a balance is suggested between the gold layer and PowerBi.

  • @krishnakoirala2088
    @krishnakoirala2088 4 месяца назад

    Since DLT displays counts on each box, is it usually slower than regular workflow? With the more enhanced features of Unity catalog came/coming in specifically, lineage and such we can easily see what (tables, views) is connected to where. Is it worth using DLT in the workflow if someone does not want to pay extra cost associated with it, considering that I will do the optimize and Z-ordering by my own in some frequency?

    • @stephanieamrivera
      @stephanieamrivera  3 месяца назад

      The performance of DLT compared to a regular workflow depends on various factors, including the specific use case, data volume, query patterns, and optimization techniques used. While DLT introduces some overhead due to its real-time change data capture capabilities, the benefits it provides may still make it worth considering, even without utilizing all of its enhanced features.
      1. Performance: DLT may have some additional processing overhead compared to regular workflows due to change tracking and maintaining transactional integrity. However, DLT's optimizations, such as indexing, caching, and predicate pushdown, can help mitigate this impact. If you leverage these optimizations effectively, the performance difference might be minimal.
      2. Enhanced features: Unity catalog, with its lineage and other capabilities, can provide valuable insights into data connections and lineage. These features can enhance data understanding, data governance, and debugging processes.
      3. Optimize and Z-ordering: Delta Lake provides various optimization techniques, including optimizing layout and improving data locality by leveraging Z-ordering. If you can incorporate these optimizations effectively into your regular workflow without using DLT, you can still achieve performance benefits without incurring the additional cost associated with DLT.
      In summary, using DLT depends on the specific requirements and constraints of your use case.

  • @HiYurd
    @HiYurd 11 месяцев назад +1

    Where do you sign-up for these Databricks Skill Builder webinars?

    • @stephanieamrivera
      @stephanieamrivera  11 месяцев назад

      Please let your solutions architect know that you'd like to be invited. They can add you to the invite list. For non-customers, only the recordings are available. Thank you for watching!

  • @Pixelements
    @Pixelements 5 месяцев назад

    Great Video, we are going to migrate from typical Data Warehouse to Lakehouse. Only thing that you did not mention (or I did not understand) is how to serve the Data for PowerBI Datasets (aka semantic models). In the Azure Data Warehouse world, we have a Technical User that refreshes the Dataset hourly or daily. But how do you refresh a dataset which is based on a Lakehouse? You youe the Databricks connector in PBI?

    • @stephanieamrivera
      @stephanieamrivera  5 месяцев назад

      I have asked Hobbs to reply :)

    • @BricksBI
      @BricksBI 5 месяцев назад

      Hi @Pixelements. If you're using an Import approach, you will set a refresh schedule in the Power BI Service and your model will then refresh itself as often as the schedule dictates. If you're using DirectQuery, each time any given report is opened, it re-runs the query its based on and retrieves the results, so there's no need to set a refresh schedule there. You can also turn on a report setting in DirectQuery reports that says "once the report is open, go ahead and re-run your query every X minutes."
      In either case, your PBI Semantic Model (previously known as PBI Datasets) is using whatever connector you used when you made it to reach from the Power BI Service to Databricks and retrieve new data.

  • @vonmoraes
    @vonmoraes 8 месяцев назад

    There is some video of conecting and using this in dataflow? i mean an hands on video haha