Wonderful demo. Have a question, where did you link up the UnityCalatog created in Metastore to the catalog on the data explorer? How is the s3 bucket attached to this table created in the schema of dev catalog? Please clarify.
Thanks! Metastores are assigned to the workspace at the account level, then any catalogs you create in the workspace are automatically associated with that metastore, and you can also only have one metastore assigned to a workspace. When you create a metastore, you must configure a default S3 bucket for the metastore, so your schemas/tables/etc will be stored in that bucket by default; however you can also setup additional buckets as "External Locations" in UC and then use those as the default root storage location for specific catalogs or schemas you create. Hope this helps!
Another suggestion if you want to made that tutorial type video that would be great. This video cover backup and restoration of databricks like what we save in our S3 and what are parallel methods. Restoration policies specifically if we use geo-redundant structure with wide number of users.
Thank you for the video. I have a large(~15gb) csv file in s3. how can i process that data in databricks. I dont want to mount the s3 bucket. Is there any way i can process this file in databricks other than mounting it?
Yes, no need to mount your bucket, you can read that from a pyspark or scala notebook in databricks with spark.read.csv("s3://path/to/data") 15GB for a single file is quite large though, I would recommend trying to split it up into multiple smaller files if possible, so that you can realize maximum parallelism from your Spark cluster. Ideally you can even convert that to Delta Lake format. If you don't split it up or convert it, you may need a cluster with more memory available.
and what if we want to create volume, I am stuck while doing databricks configuration with AWS and using demo version of premium The problem where i have been stuck is default metastore which occurs every time when i try to create volume.
Hi, I recommend submitting a Question to stackoverflow using the [databricks] tag. Myself and several others are very active in that forum and would be happy to help, given more details about your use case! Thank you for watching!
By far the best tutorial I've seen. Thank you for putting this out.
it's worst unclear
Thanks for posting!! much needed stuff.
This is so helpful! Thank you for posting.
Wonderful demo. Have a question, where did you link up the UnityCalatog created in Metastore to the catalog on the data explorer? How is the s3 bucket attached to this table created in the schema of dev catalog? Please clarify.
Thanks! Metastores are assigned to the workspace at the account level, then any catalogs you create in the workspace are automatically associated with that metastore, and you can also only have one metastore assigned to a workspace. When you create a metastore, you must configure a default S3 bucket for the metastore, so your schemas/tables/etc will be stored in that bucket by default; however you can also setup additional buckets as "External Locations" in UC and then use those as the default root storage location for specific catalogs or schemas you create. Hope this helps!
Where can I find the json template of custom trust policy
Is it possible to create volumes on top of this external storage container ?
Another suggestion if you want to made that tutorial type video that would be great.
This video cover backup and restoration of databricks like what we save in our S3 and what are parallel methods. Restoration policies specifically if we use geo-redundant structure with wide number of users.
awesome explanation
Thank you!
Thank you for the video.
I have a large(~15gb) csv file in s3. how can i process that data in databricks. I dont want to mount the s3 bucket. Is there any way i can process this file in databricks other than mounting it?
Yes, no need to mount your bucket, you can read that from a pyspark or scala notebook in databricks with spark.read.csv("s3://path/to/data")
15GB for a single file is quite large though, I would recommend trying to split it up into multiple smaller files if possible, so that you can realize maximum parallelism from your Spark cluster. Ideally you can even convert that to Delta Lake format. If you don't split it up or convert it, you may need a cluster with more memory available.
and what if we want to create volume,
I am stuck while doing databricks configuration with AWS and using demo version of premium
The problem where i have been stuck is default metastore which occurs every time when i try to create volume.
Hi, I recommend submitting a Question to stackoverflow using the [databricks] tag. Myself and several others are very active in that forum and would be happy to help, given more details about your use case! Thank you for watching!