Azure Data Factory, Azure Databricks, or Azure Synapse Analytics? When to use what.

Поделиться
HTML-код
  • Опубликовано: 28 май 2024
  • Have you ever found yourself at the start of an Azure data engineering project, unsure about what tool to choose? Speak no more! In this session we will discuss three often used data engineering tools on Azure:
    - Azure Data Factory
    - Azure Databricks
    - Azure Synapse Analytics
    Speaker: Lisa Hoving SQLbits.com/speakers/Lisa_Hoving
    SQLbits.com/Sessions/Azure_Dat...
    Tags: Azure,Synapse Analytics,Spark,Data Lake,Managing,Successful Delivery,Data Bricks,Architecture & Infrastructure,Big Data & Data Engineering

Комментарии • 41

  • @JSinghCode
    @JSinghCode 6 дней назад +1

    Very helpful for our choice!

  • @kirole7381
    @kirole7381 10 дней назад +1

    Thank you for the work Lisa !

  • @saivenkateshtummala5576
    @saivenkateshtummala5576 2 месяца назад +4

    This is really helpful for someone starting new, thank you!

  • @shanthababu
    @shanthababu 8 месяцев назад +5

    Excellent! Thanks, Lisa Hoving.

  • @MuhammadUsamaAwan
    @MuhammadUsamaAwan 7 месяцев назад +3

    It was an excellent session regarding all these tools. It helps you a lot to understand when to use what.

  • @premanandasahoo290
    @premanandasahoo290 6 месяцев назад +2

    Thanks a lot @lisa. I got a whole lot of clarity. Was always confused about which service to use and why.

  • @marchelomoratti1
    @marchelomoratti1 3 месяца назад +3

    Thank you so much for the presentation! It was very informative, it gave me a great picture of those tools!

  • @psvarada
    @psvarada 6 месяцев назад +2

    very nicely explained. great job!

  • @valliguduru4963
    @valliguduru4963 2 месяца назад +2

    Thank you for the video. Excellent analysis and presentation!!! Can you please do a comparision video for Azure Fabric vs Azure Databricks.

  • @cloudbaud7794
    @cloudbaud7794 Месяц назад +1

    Nice info and fun to watch 😊

  • @nikjojo
    @nikjojo 9 месяцев назад +1

    Great presentation thank you.

    • @SQLBits
      @SQLBits  9 месяцев назад

      Our pleasure!

  • @pauloroncarati
    @pauloroncarati 3 месяца назад +1

    Great presentation!

    • @SQLBits
      @SQLBits  3 месяца назад

      Thank you kindly!

  • @williamnguyen5771
    @williamnguyen5771 2 месяца назад +2

    HAHAHAHA 20:12 man she’s so hilarious for keeping it real. ADHD here too

  • @davidlion4482
    @davidlion4482 11 месяцев назад +12

    Azure Data Factory is similar to SSIS and doesn't have a data store to persist the data, but Azure Databricks and Azure Synapse has a database engine to support the storage of data.
    Azure Data Factory is only an ETL/ELT tool. But for the other two there are ETL/ELT and database.
    In case this, Azure Data Factory shouldn't be compared to a database.

    • @devarshsanghvi9315
      @devarshsanghvi9315 11 месяцев назад +2

      Its a seperate tool that's true and as many people use ETL with Data Factory they do have doubts about Should I use Azure Synapse / Azure Databricks for my ETL or I should continue using Azure Data Factory. Noting don't know code can leverage UI with little extra cost and who knows code can save little too.

    • @LisaHoving
      @LisaHoving 11 месяцев назад +5

      Migrating to Databricks can offer you a bit more flexibility, but you would have to migrate all the pipelines to code. Alternatively, you could use both tools, and make your new flows in Databricks. Notebooks and packaged code in databricks can easily be kicked off by ADF, making it a cool orchistrator!

    • @grahamthomas7821
      @grahamthomas7821 8 месяцев назад

      Agreed that ADF seems like an odd comparison here but the Databricks vs Synapse comparison was really helpful

    • @rajeshshetty4685
      @rajeshshetty4685 5 месяцев назад

      Why then the speaker is saying that there is no data storage (24:36) in all three:?

  • @datadataeverywhere6954
    @datadataeverywhere6954 15 дней назад

    Eye opening

  • @MauriceBierhuizen
    @MauriceBierhuizen 3 месяца назад +2

    Very clear. And hilarious when she misspoke sqlbit, and blamed her adhd🤣

  • @peterpan-yj4rn
    @peterpan-yj4rn 2 месяца назад +1

    Why ADF can’t be used for Power BI if the target data model is SQL server?!

    • @LisaHoving
      @LisaHoving 2 месяца назад +2

      If SQL Server is the target, you can indeed just connect Power BI to SQL Server and do your aggregations/data loading with ADF, no problem! My point was more regarding to connecting ADF to Power BI. In synapse and Databricks you can create tables and use these definitions directly in Power BI by connecting these tools. ADF has no such thing.

  • @waldchiller4695
    @waldchiller4695 4 месяца назад +1

    Here still just having on prem projects with SSIS LOL.

  • @sbudama242
    @sbudama242 10 месяцев назад

    I am bit confused, why cant we store data in Databricks. Databricks has Lake house to do so?

    • @grahamthomas7821
      @grahamthomas7821 8 месяцев назад +1

      I guess it's because it's just Azure data lake storage under the hood? So technically the data isn't actually stored in Databricks

    • @michaszalast6094
      @michaszalast6094 8 месяцев назад

      lake house is just the architectural approach. as of my knowledge, every analytical, cloud based solution is built on top of some kind of cloud data storage (adls, blob storage, aws s3 etc.) and this is only a data storage layer

    • @himondas18
      @himondas18 7 месяцев назад

      as per my understanding, databricks and synapse store data in azure blob storage, and give you a database/ datawarehouse like model on top of that, so that you can do easier analytics or other stuffs. Even some projects creates data integration and pipeline in ADF to trigger databricks jobs/notebooks and synapse can do analytics and use BI tools over delta lake in databricks.

  • @ivanp9222
    @ivanp9222 8 месяцев назад +1

    What about the Java u highlighted earlier? Or did i missed it 😂

    • @UNNIE2363
      @UNNIE2363 5 месяцев назад

      Yes , you kinda missed it . She mentions go with Databricks if speciality is in Java, as Java lang is supported

    • @DiscobiscuitUK1
      @DiscobiscuitUK1 2 месяца назад

      ruclips.net/video/_QtA_492l4k/видео.htmlsi=NAXqM24LibEQz4tI&t=1171

  • @steelmilkjug
    @steelmilkjug 6 месяцев назад

    What can DataBricks do that Synapse cannot do better?

    • @danhorus
      @danhorus 6 месяцев назад +6

      Here's a few off the top of my head:
      1. Databricks clusters are more flexible. You can choose the cheaper Compute Optimized VMs for append-only incremental processing, or Storage Optimized VMs to enable caching on the local SSDs, among other VM types. In Synapse, you can only use Memory Optimized and GPU Optimized VMs;
      2. Databricks clusters allow you to use Spot VMs for the workers, which are significantly cheaper as well. Synapse does not support Spot VMs;
      3. Databricks allows for better cluster sharing, as the same cluster can have multiple Spark sessions active at once. Synapse reserves slots for each Spark session, and those slots will sit idle when the developer is not running any code -- they can't be used by other developers while they are reserved;
      4. The notebook file format in Databricks lends itself better to git diffs in Pull Requests, as they are regular code files (e.g., Python code) with some comments for special cells. Synapse notebooks, on the other hand, are saved as JSON files which are much harder to review in a git diff interface;
      5. Databricks has exclusive features such as Auto Loader and identity columns, which are really helpful for data engineering and framework development;
      6. Databricks is the flagship product of the company founded by the creators of Apache Spark, and as such it will always have an edge in supporting new Spark versions and features. Meanwhile, Synapse is a PaaS offering from Microsoft, and Microsoft is now clearly focusing a lot more in their SaaS offering: Microsoft Fabric. If I had to build a data platform on Azure today, I would use Databricks as my transformation engine. Hope this helps! :)

  • @tinasheyamaone5435
    @tinasheyamaone5435 5 месяцев назад +1

    WHAT DOES MORE MATURE EVEN MEAN???!!!!

    • @SQLBits
      @SQLBits  5 месяцев назад

      Hi Tinashelyemaone5435, you can get in touch with the speakers directly through LinkedIn and X! They are normally more than happy to help.

    • @kimstuart7989
      @kimstuart7989 4 месяца назад +1

      the amount of work the developing community has put into it. Think of it as beta vs stable. Databricks is way more stable, has been developed through iterations to catch bugs and implement fixes already. Synapse Analytics is newer comparatively and is going through that iterative process now, so in time its reliability will catch up to that of Databricks.

    • @bms4654
      @bms4654 Месяц назад

      I would say maturity is the level of knowledge and skills an organization has to support these tools. You are not going to give a graphing calculator to a 6 yr old child. You are not going to give databricks to a company that has everything in spreadsheets.

  • @tinasheyamaone5435
    @tinasheyamaone5435 5 месяцев назад +4

    You Said absolutely Nothing!!!