Databricks Unity Catalog : Setup and Demo on AWS

Поделиться
HTML-код
  • Опубликовано: 21 окт 2024

Комментарии • 15

  • @ft_angel91
    @ft_angel91 9 месяцев назад +2

    By far the best tutorial I've seen. Thank you for putting this out.

    • @kunnunhs1
      @kunnunhs1 5 месяцев назад

      it's worst unclear

  • @chaitanyamuvva
    @chaitanyamuvva 4 месяца назад

    Thanks for posting!! much needed stuff.

  • @aaronwong8533
    @aaronwong8533 Год назад

    This is so helpful! Thank you for posting.

  • @rajanzb
    @rajanzb 8 месяцев назад

    Wonderful demo. Have a question, where did you link up the UnityCalatog created in Metastore to the catalog on the data explorer? How is the s3 bucket attached to this table created in the schema of dev catalog? Please clarify.

    • @MakeWithData
      @MakeWithData  8 месяцев назад

      Thanks! Metastores are assigned to the workspace at the account level, then any catalogs you create in the workspace are automatically associated with that metastore, and you can also only have one metastore assigned to a workspace. When you create a metastore, you must configure a default S3 bucket for the metastore, so your schemas/tables/etc will be stored in that bucket by default; however you can also setup additional buckets as "External Locations" in UC and then use those as the default root storage location for specific catalogs or schemas you create. Hope this helps!

  • @AthenaMao
    @AthenaMao 5 месяцев назад +1

    Where can I find the json template of custom trust policy

  • @lostfrequency89
    @lostfrequency89 3 месяца назад

    Is it possible to create volumes on top of this external storage container ?

  • @hassanumair6967
    @hassanumair6967 Год назад +1

    Another suggestion if you want to made that tutorial type video that would be great.
    This video cover backup and restoration of databricks like what we save in our S3 and what are parallel methods. Restoration policies specifically if we use geo-redundant structure with wide number of users.

  • @SaurabhKumar-ic7nt
    @SaurabhKumar-ic7nt Год назад

    awesome explanation

  • @NdKe-j3k
    @NdKe-j3k Год назад

    Thank you for the video.
    I have a large(~15gb) csv file in s3. how can i process that data in databricks. I dont want to mount the s3 bucket. Is there any way i can process this file in databricks other than mounting it?

    • @MakeWithData
      @MakeWithData  11 месяцев назад

      Yes, no need to mount your bucket, you can read that from a pyspark or scala notebook in databricks with spark.read.csv("s3://path/to/data")
      15GB for a single file is quite large though, I would recommend trying to split it up into multiple smaller files if possible, so that you can realize maximum parallelism from your Spark cluster. Ideally you can even convert that to Delta Lake format. If you don't split it up or convert it, you may need a cluster with more memory available.

  • @hassanumair6967
    @hassanumair6967 Год назад

    and what if we want to create volume,
    I am stuck while doing databricks configuration with AWS and using demo version of premium
    The problem where i have been stuck is default metastore which occurs every time when i try to create volume.

    • @MakeWithData
      @MakeWithData  Год назад +2

      Hi, I recommend submitting a Question to stackoverflow using the [databricks] tag. Myself and several others are very active in that forum and would be happy to help, given more details about your use case! Thank you for watching!