Advancing Fabric - The Data Engineering Experience
HTML-код
- Опубликовано: 20 июл 2023
- The next Microsoft Fabric experience we're diving into is Data Engineering - this is where we can use the power of Spark to gain massive performance and huge automation gains. We can create notebooks, quickly spin up a session and start querying both files and tables in our Lakehouse objects.
In this video, Simon & Craig look at building a sample notebook, querying some parquet files and writing it back down to Delta tables within our Lakehouse, including a quick hack to automate writing many tables at once.
If you're just getting started with the Data Engineering experience, check out the docs here: learn.microsoft.com/en-us/fab...
And if you're thinking about starting on your Microsoft Fabric journey, Advancing Analytics can help you get there faster, and help you to get it right first time!
Bit concerning at the moment notebooks are unsupported for Git integration & deployment pipelines, hopefully we can get some support either in Fabric or via an API for DevOps prior to GA.
Learning pyspark in Fabric, love the tips!
I'd love to learn more about custom pools
Would have been great if you showed the initial ingestion step of how to get these parquet files into the Lakehouse :) all in all great video ! Keep them coming guys #fabricators
Yep, we'll do a "getting data into Fabric" episode soon, so we didn't cover it here!
Does VS Code extension allow you to run spark commands remotely? Similarly to how it works for AzureML? If so, that would be fantastic and a major advantage over mediocre Databricks vs code extension...
yes, it does support this scenario.
Does it able to set a version control in notebooks using devops?
From Databricks perspective a lakehouse is logical place inclusive of all 3 zones - bronze | silver | gold - even though on physical plane these can be in separate storage account or containers. The terminology in Fabric for using separate lake house for each of the 3 layers is confusing.
Very nice video! A quick question: when you created the tables using the files, does the data duplicate in fabric, i.e., more onelake usage?
How long does a custom Spark cluster generally take to start?