Advancing Spark - Data + AI Summit 2024 Key Announcements

Поделиться
HTML-код
  • Опубликовано: 28 ноя 2024

Комментарии •

  • @shawndeggans
    @shawndeggans 5 месяцев назад +1

    Excellent, summary Simon! I'm looking forward to LakeFlow. 😀

  • @omgitsbenhayes
    @omgitsbenhayes 5 месяцев назад

    Nice recap of the key announcements! ABAC demo was 🎉

  • @mkrichey1
    @mkrichey1 5 месяцев назад +1

    I think most of those changes are going to have a big effect on the way we manage data. Databricks are setup to be the single tool right up to the point you visualize the end result. Wonder how MS feel about the fact they might end up serving instances of the very platform that makes fabric a bit redundant :P especially if the pricing is clear and competitive :)

  • @drummerboi4eva
    @drummerboi4eva 5 месяцев назад

    Amazing Simon, thanks for this update

  • @rommelbojorgee.8902
    @rommelbojorgee.8902 5 месяцев назад

    Nice summary, thanks Simon

  • @kaurivneet1
    @kaurivneet1 4 месяца назад

    Was eagerly waiting for your video Simon! I think if Lakeflow turns out to be as good as another replication tool that would be the biggest disruptor to how we currently do lakehouse. Data acquisition has always been the sore point in the data platform.
    Other one as you mentioned, tag based access control is something I have been waiting for almost 2 years!

  • @MichaelEwins1967
    @MichaelEwins1967 5 месяцев назад

    I agree that the Tabular acquisition will lead to improved interoparability for Delta Lake & Iceberg users. For me this signals more the trend of reducing data movement and ETL so that people can use data where it is. And all access control is managed by Unity Catalog.

  • @alexischicoine2072
    @alexischicoine2072 5 месяцев назад

    lake flow seems great if you can source control and deploy it. Hopefully it’s also somewhat testable.

  • @ErikParmann
    @ErikParmann 5 месяцев назад

    What do you know about the realtime mode? Do you think it's just a rename of the experimental spark continuous mode?

    • @AdvancingAnalytics
      @AdvancingAnalytics  5 месяцев назад +1

      I need to dig into what's been announced publicly so I don't break NDAs - but I can say that what I've seen has come a fair way from the old continuous mode, it's more than just the spark engine change behind what's driving the performance increase.

  • @alexischicoine2072
    @alexischicoine2072 5 месяцев назад

    I liked your point about Serverless and what we do. Hopefully by the time the transition is done I’ll have retired into leadership 😂

  • @danhorus
    @danhorus 5 месяцев назад +1

    And I'm here still waiting for that for_each task xD

    • @AdvancingAnalytics
      @AdvancingAnalytics  5 месяцев назад

      It's on the roadmap, it was on one of the keynote slides and everything! 😅

  • @alexischicoine2072
    @alexischicoine2072 5 месяцев назад

    Hoping we get branches in delta for write audit publish as that’s a pretty useful feature in Iceberg.

  • @Mim_BI
    @Mim_BI 5 месяцев назад +1

    come on, not a single word about duckdb, it was everywhere on the keynote :)

    • @AdvancingAnalytics
      @AdvancingAnalytics  5 месяцев назад +1

      Haha, it's true - there was the segment from Hannes himself. But the update is largely that DuckDB can now natively read Delta right, nothing I saw is directly Databricks functionality? That said, I'm waaaay overdue a separate video spinning up duckdb on a single node and showing how fast it is!

  • @norbertczulewicz1695
    @norbertczulewicz1695 5 месяцев назад

    Currently only SQL Warehouse can be serverless which supports SQL only. Does it mean that Python is not recommended in the new projects?

    • @AdvancingAnalytics
      @AdvancingAnalytics  5 месяцев назад

      That's what the announcements were all about - they're rolling out Serverless for Workflows/Notebooks which means full serverless python support. Python is thoroughly recommended for any engineering/automation workloads (with embedded SQL for transformations as necessary)

    • @norbertczulewicz1695
      @norbertczulewicz1695 5 месяцев назад

      @@AdvancingAnalytics The biggest problem is the price. Serverless option is the most expensive workload in Databricks. For many companies it can be a blocker especially when chipper option exist. I've heard about situation where companies ask developers not to use SQL serverless warehouse because of that

    • @ErikParmann
      @ErikParmann 5 месяцев назад

      @@AdvancingAnalytics Hopefully this means serverless supported in more regions!

  • @gags220988
    @gags220988 5 месяцев назад

    Isn't Genie just hitting the openAI endpoint?

    • @AdvancingAnalytics
      @AdvancingAnalytics  5 месяцев назад

      Nope - the original Databricks Assistant was using OpenAI, this new iteration is a flavour of DBRX, with the context of your own data (unity catalog, recent activity/queries etc etc). Should have far, far more context than just hitting an open endpoint.

  • @TomPerry83
    @TomPerry83 5 месяцев назад +1

    I agree with the points about serverless making things easier and doing it better than a person would do. However, I would still want to know what it is doing, so I could replicate elsewhere (self hosted, other future vendor, etc). Otherwise this is another type of vendor lock
    Ie, if I'm too reliant on the platform optimising stuff for me, then I'm effectively locked in.

  • @brucelanskiy8110
    @brucelanskiy8110 4 месяца назад

    I