Advancing Spark - Delta Sharing

Поделиться
HTML-код
  • Опубликовано: 28 июл 2024
  • We heard about this new idea of Delta Sharing back in 2021 and we've seen the open source protocol available via the delta.io site ever since. The promise we've had from the beginning is that Unity Catalog can act as a Delta Sharing Server, without all of the setup hassle - and with the GA of both Unity Catalog and Delta Sharing coming soon, it's about time we looked at just how easy it is!
    In this video Simon walks through a Delta Sharing example - configuring the metastore for sharing, creating a new share, onboarding a recipient and querying data!
    If you're getting started with Delta Sharing, you can see the full github of samples here delta.io/sharing/ - but we suggest starting with the Databricks Docs if you're using DBX - docs.databricks.com/data-shar...
    As always, give Advancing Analytics a call if we can help you on your Lakehouse Journey

Комментарии • 30

  • @evogelpohl
    @evogelpohl 2 года назад +1

    Nice work as always, Sir. It's clear that the bones of the sharing-ecosystem, Delta based, are here. Excited to see UI/UX's over top ala new layered products.

  • @kuldipjoshi1406
    @kuldipjoshi1406 2 года назад +4

    Hi, If you can make a detailed video about table access control, hierarchy of how it works in databricks and best practices , That would be great. Awsm video btw.

  • @chittillavenkataviswanath1389
    @chittillavenkataviswanath1389 Год назад

    You are truly amazing! Best learning experiences to start the new year.

  • @aqlanable
    @aqlanable 2 года назад

    Since we are talking about delta sharing, its worth to have a look at alert destinations and alerts in sql persona

  • @gabrielcohensabban4968
    @gabrielcohensabban4968 Год назад

    Could you please include a link to the notebook used in this video. Thanks amazing video!!

  • @drummerboi4eva
    @drummerboi4eva 2 года назад

    Amazing ! Thanks for making these detailed videos Simon ! Do you know if dynamic data masking for GDPR is possible with delta sharing ?

    • @aqlanable
      @aqlanable 2 года назад +1

      Its possible with unity catalog, u can mask row level, colum level, data level and in powerbi, it will be masked

    • @aqlanable
      @aqlanable 2 года назад +1

      Unfortunately, you will have to create views and delta sharing doesn't support dynamics view at current time that we are talking, so mostly you need to go with unity catalog then create dynamic view and provide sql endpoint to the powerbi

  • @danhorus
    @danhorus 2 года назад +1

    23:05 I have the exact same question. If ADLS is in a VNET with no public internet access, I don't suppose Delta Sharing would work because the recipient must be able to query the data directly from ADLS, right? This can be quite a deal breaker for building secure meshes

    • @ArcaLuiNeo
      @ArcaLuiNeo 2 года назад

      I assume for such scenarios one has to start looking at a self hosted delta sharing server...

  • @aqlanable
    @aqlanable 2 года назад

    Delta sharing still not mature to be in enterprise level, however im waiting for post-ga regards to delta sharing and data marketplace provided from databricks

  • @AprenderDados
    @AprenderDados Год назад

    And who processes the data? PowerBI is reading delta?
    Do I need to provide cluster or any computing resource?

    • @AdvancingAnalytics
      @AdvancingAnalytics  Год назад

      Delta Sharing essentially just returns a payload of keys to access the underlying cloud files - so your client still does the reading/processing etc! The server part of Delta Sharing doesn't currently require any kind of cluster/compute etc

  • @seyma4479
    @seyma4479 Год назад

    that would be great if you make a video how to build delta sharing server on our localhosts serving the data from S3 🙂🙂

  • @rickrofe4382
    @rickrofe4382 2 года назад

    Thanks for the preview. Do you know if the same integration with Power BI still work in AWS?

    • @AdvancingAnalytics
      @AdvancingAnalytics  2 года назад +2

      Yep! From the recipient's point of view, the Delta Sharing Server could be in Azure Databricks, AWS, a local web server, anywhere! That's the beauty of it being an open protocol!

    • @rickrofe4382
      @rickrofe4382 2 года назад

      @@AdvancingAnalytics Super cool!

    • @vinodhkumarganesan6778
      @vinodhkumarganesan6778 Год назад

      @@AdvancingAnalytics Hi, Did you see or experience a performance improvement with Power BI running with delta share rather than on SQL warehouse

  • @nayan001ujjain
    @nayan001ujjain Год назад

    Hi, Thanks for sharing the knowledge about delta sharing. Can you please explain how costing work in delta sharing and how many hits user can do . Is there any limit? Databricks charging on the basis of IOPS ?

    • @AdvancingAnalytics
      @AdvancingAnalytics  Год назад

      Good question - at the moment I've not seen any costs associated! There will be the underlying cost of storage access, data egress etc, but I've not seen a cost model from Databricks yet!

    • @nayan001ujjain
      @nayan001ujjain Год назад

      @@AdvancingAnalytics Thank you 😊

  • @ddarkings
    @ddarkings Год назад

    Is there an advantage to setting up delta share for PBI as opposed to linking PBI direct to SQL Endpoint in Databricks as shown in the Partner connect demos. I guess its a way of limiting which tables can be seen in PBI. Are there other benefits as there is more to set up doing the delta share way

    • @AdvancingAnalytics
      @AdvancingAnalytics  Год назад +1

      Couple of reasons: 1) Delta share doesn't use Databricks compute (aka, it's cheaper) albeit with some limitations, 2) It's primarily focused on users outside of your AD Tenant, who would not be able to connect to your DBX endpoint

  • @rostislawkrassow7385
    @rostislawkrassow7385 2 года назад

    Thanks for sharing the review. Could a view also be part of a share?

    • @danhorus
      @danhorus 2 года назад +1

      The documentation on GitHub mentions support for views. I hope Simon can test it and let us know if there are limitations for views with joins, etc.
      I would also be a little worried about the security aspect of these views, because perhaps the recipient is able to retrieve the underlying SAS Key and access the unmodified table(s) in ADLS instead of a filtered view with row-level security

    • @rostislawkrassow7385
      @rostislawkrassow7385 2 года назад

      That's exactly the point. A view with row-level security or join inside requires creation of new physical files to share them on file level with SAS tokens.
      Only in case of materialized views (new announced feature) this would work on already persisted set of files.

    • @aqlanable
      @aqlanable 2 года назад

      Dynamic view/views still in the post-ga, currently only table supported.

    • @rostislawkrassow7385
      @rostislawkrassow7385 2 года назад

      @@aqlanable thank you for sharing the insight! Curious to see how that will work

  • @akhilannan
    @akhilannan 2 года назад

    Can you add a view to the share? Or it has to be table?

    • @aqlanable
      @aqlanable 2 года назад

      Currently only tablr supported, they are working on view in post-ga, u have to wait couple of months