Very useful video to know about the managed table and external table, how to create and to store data back to Azure datalake using external table. Thanks Piotr for this great content!!
I'm not aware of any performance differences between those two (or at least I didn't notice any). Important difference between them is that if you drop a managed table then it will also drop your data. In case of external table, only the metadata in metastore is removed but the actual data stays untouched (as it is stored externally).
I configured aacess on cluster level When i directly save files in data lake, usin spark.write.format("delta").save("path") everything was fine But when i tried to create the external table ( using the same create table query as you ) i had an error, external location.. when i created storage credentiels and external location then i re execute the cell, it worked
Same here. I had to: 1- Create an Access Connector for Azure Databricks in the Azure portal. 2- Using the Access Connector's Managed Identity, create a new Databricks Storage Credentials in the Catalog. 3- Create an External Location pointing to the ADLS path in the storage account, using the Storage Credentials. Then I was able to run the command. BTW this managed identity needs Storage Blob Data Reader or Storage Blob Data Contributor permissions on the storage account to work.
Very useful video to know about the managed table and external table, how to create and to store data back to Azure datalake using external table. Thanks Piotr for this great content!!
Glad you liked it
Does DBrx MANAGED tables have any advantages over the EXTERNAL tables? like btter performance on read/write etc?
I'm not aware of any performance differences between those two (or at least I didn't notice any).
Important difference between them is that if you drop a managed table then it will also drop your data. In case of external table, only the metadata in metastore is removed but the actual data stays untouched (as it is stored externally).
what is the difference between save vs saveAsTable?
Both will save the data but the latter will also register it as a table in the catalog.
Is there any possibility to get access to the databricks notebook you use.
Yes, check my GitHub - link is in the video description.
When i tried to create an external table, i was forced to create storage credentiel and external location,
What was the code you wrote?
I configured aacess on cluster level
When i directly save files in data lake, usin spark.write.format("delta").save("path") everything was fine
But when i tried to create the external table ( using the same create table query as you ) i had an error, external location.. when i created storage credentiels and external location then i re execute the cell, it worked
Same here. I had to:
1- Create an Access Connector for Azure Databricks in the Azure portal.
2- Using the Access Connector's Managed Identity, create a new Databricks Storage Credentials in the Catalog.
3- Create an External Location pointing to the ADLS path in the storage account, using the Storage Credentials.
Then I was able to run the command.
BTW this managed identity needs Storage Blob Data Reader or Storage Blob Data Contributor permissions on the storage account to work.