Видео 85
Просмотров 66 913

Unlock the Power of Unity Catalog Mini-Series -P3: Storage Catalogs, Schemas, and managing your data

6:58

Table Optimization with Liquid Clustering

11:55

Desbloquea el Poder de Unity Catalog: Parte 2: Storage Credentials y External Locations

16:35

Mosaic AI Vector Search

5:37

GenAI Framework for Accelerating Migrations to Databricks

11:39

Desbloquea el Poder de Unity Catalog: Una Mini Serie de Migración Paso a Paso - Parte 1

7:30

Checkout Databricks Apps!

#databricks #databricksapps #webapp

Видео

Unlock the Power of Unity Catalog Mini-Series -P3: Storage Catalogs, Schemas, and managing your data

6:58

Unlock the Power of Unity Catalog Mini-Series -P3: Storage Catalogs, Schemas, and managing your data

Просмотров 11721 день назад

In this episode, we’re diving deep into data management and exploring what it means at the catalog, schema, and table levels. You'll gain hands-on insights into effectively managing data access, structure, and organization, setting a solid foundation for secure, compliant, and easily discoverable data. This blog is a good resource: dgomez04.github.io/2024/10/15/mastering-data-design/ #unitycata...

Table Optimization with Liquid Clustering

11:55

Table Optimization with Liquid Clustering

Просмотров 15221 день назад

See how liquid clustering, zordering, and partitioning impact table optimizes table performance. Specifically the impact on query speed and file size. Also a quick tip on using liquid clustering with Spark structured streaming. #databricks #dataengineering #dataengineeringessentials

Desbloquea el Poder de Unity Catalog: Parte 2: Storage Credentials y External Locations

16:35

Desbloquea el Poder de Unity Catalog: Parte 2: Storage Credentials y External Locations

Просмотров 49Месяц назад

Una Mini Serie de Migración Paso a Paso: Parte 2: Storage Credentials y External Locations ¡Prepárate para revolucionar la gobernanza de tus datos! Únete a nosotros en un viaje donde te guiaremos a lo largo de todo el proceso de implementación de Unity Catalog, de principio a fin. Ya seas un principiante en gobernanza de datos o un profesional experimentado, esta miniserie integral te brindará...

5:37

Mosaic AI Vector Search

Просмотров 144Месяц назад

Introduction to Mosaic AI vector search, which is a vector database that is built into the Databricks Data Intelligence Platform and integrated with its governance and productivity tools #databricks

GenAI Framework for Accelerating Migrations to Databricks

11:39

GenAI Framework for Accelerating Migrations to Databricks

Просмотров 262Месяц назад

Project Legion is a solution accelerator that provides a GenAI framework for accelerating migrations to Databricks. It is a Databricks Labs Sandbox project that presents users with an easy-to-use interface for fine-tuning AI agents to explain and translate code. The tool can operate in interactive mode, where users copy and paste their code into the tool and output a Databricks Notebook, or it ...

Desbloquea el Poder de Unity Catalog: Una Mini Serie de Migración Paso a Paso - Parte 1

7:30

Desbloquea el Poder de Unity Catalog: Una Mini Serie de Migración Paso a Paso - Parte 1

Просмотров 682 месяца назад

¡Prepárate para revolucionar la gobernanza de tus datos! Únete a nosotros en un viaje donde te guiaremos a lo largo de todo el proceso de implementación de Unity Catalog, de principio a fin. Ya seas un principiante en gobernanza de datos o un profesional experimentado, esta miniserie integral te brindará la experiencia y la confianza necesarias para desbloquear todo el potencial de Unity Catalo...

Unlock the Power of Unity Catalog Mini-Series - Part 2: Storage Credentials and External Locations

10:35

Unlock the Power of Unity Catalog Mini-Series - Part 2: Storage Credentials and External Locations

Просмотров 3462 месяца назад

A Step-by-Step Mini-Series to a Secure, Intelligent, and Connected Data Ecosystem: Get ready to revolutionize your data governance! Join us on an epic journey as we take you through the entire process of implementing Unity Catalog from start to finish. Whether you're a data governance newbie or a seasoned pro, this comprehensive miniseries will empower you with the expertise and confidence to u...

Unlock the Power of Data in Energy: Databricks Data Intelligence Platform Walkthrough

20:19

Unlock the Power of Data in Energy: Databricks Data Intelligence Platform Walkthrough

Просмотров 3803 месяца назад

Join us for a tour of the Databricks Data Intelligence Platform, specifically designed for the energy sector. This video will show how different teams and stakeholders can collaborate seamlessly within the platform to drive business value. Whether you're a data engineer, analyst, or business leader, discover how Databricks can help you: * Unify your data and analytics efforts * Drive innovation...

Unlock the Power of Unity Catalog: A Step-by-Step Migration Mini-Series - Part 1

8:28

Unlock the Power of Unity Catalog: A Step-by-Step Migration Mini-Series - Part 1

Просмотров 2983 месяца назад

Get Ready to Revolutionize Your Data Governance! Join us on an epic journey as we take you through the entire process of implementing Unity Catalog from start to finish. Whether you're a data governance newbie or a seasoned pro, this comprehensive miniseries will empower you with the expertise and confidence to unlock the full potential of Unity Catalog in your organization! Part 1 - What is UC...

Genie Spaces More Than Text to SQL 08.01.2024

15:10

Genie Spaces More Than Text to SQL 08.01.2024

Просмотров 2683 месяца назад

Generating a marketing engagement with Amazon book reviewers #genai #databricks #dataintelligence

Kickstart Your AI Journey on Databricks with AI-Cookbook.io!

50:18

Kickstart Your AI Journey on Databricks with AI-Cookbook.io!

Просмотров 4183 месяца назад

Ready to revolutionize your business with AI? Join Arthur as he reveals the secret to building your own AI system on Databricks! Discover the ultimate roadmap to success with ai-cookbook.io, the recommended approach to accelerating #GenAI development on Databricks. In this game-changing session, Arthur will show you how to seamlessly integrate your data into pre-built AI quickstarts, empowering...

Demystify Serverless Networking - Azure Databricks Networking Part 2

1:04:30

Demystify Serverless Networking - Azure Databricks Networking Part 2

Просмотров 5623 месяца назад

Join Arthur in Part 2 of our Azure Databricks Network Security series as he dives into the world of Serverless Networking! Discover the secrets to setting up Serverless SQL. Configure the #networking connection from #Serverless compute to your data. In this #tutorial, Arthur will walk you through the step-by-step process of building out your Databricks workspace and account controls to establis...

Build Your Own Genie??? FAST GenAI Text to SQL in 20 Mins with YOUR data! 2024.07.10

25:25

Build Your Own Genie??? FAST GenAI Text to SQL in 20 Mins with YOUR data! 2024.07.10

Просмотров 5064 месяца назад

CHECKOUT and STAR Robert's repo on Github! github.com/rmosleydb/text-to-sql #databricks #genai #genie #text2sql

Revolutionize Decision-Making with Genie Spaces! 2024.05.01

4:50

Revolutionize Decision-Making with Genie Spaces! 2024.05.01

Просмотров 2245 месяцев назад

Imagine having the power to unlock instant insights and answers right at your fingertips. In this video, discover how to create a Genie Space in Databricks that empowers decision-makers to ask questions in plain English and get rapid, actionable responses. Say goodbye to tedious data analysis and hello to data-driven decision-making! ► Speaker - Hobbs www.linkedin.com/in/iamhobbs #databricks #g...

Unlock the Power of Conversational Data Analysis! 2024.04.30

7:35

Unlock the Power of Conversational Data Analysis! 2024.04.30

Просмотров 1915 месяцев назад

Unlock the Power of Conversational Data Analysis! 2024.04.30

How to enable firewall support for your Azure workspace storage account 2024.05.30

24:31

How to enable firewall support for your Azure workspace storage account 2024.05.30

Просмотров 6515 месяцев назад

How to enable firewall support for your Azure workspace storage account 2024.05.30

Mastering the SparkUI on Databricks 2024.04.30

9:19

Mastering the SparkUI on Databricks 2024.04.30

Просмотров 5486 месяцев назад

Mastering the SparkUI on Databricks 2024.04.30

Unlock Databricks + AWS Network Configuration Secrets! 2024.05.09

23:51

Unlock Databricks + AWS Network Configuration Secrets! 2024.05.09

Просмотров 3526 месяцев назад

Unlock Databricks AWS Network Configuration Secrets! 2024.05.09

GenAI Showdown in 10 Minutes! - Step by Step guide to Evaluating LLMs with MLflow! - 2024.04.29

9:54

GenAI Showdown in 10 Minutes! - Step by Step guide to Evaluating LLMs with MLflow! - 2024.04.29

Просмотров 1 тыс.6 месяцев назад

GenAI Showdown in 10 Minutes! - Step by Step guide to Evaluating LLMs with MLflow! - 2024.04.29

Create a DBRX-based Gen AI Agent in 20 minutes! 2024.04.04

21:57

Create a DBRX-based Gen AI Agent in 20 minutes! 2024.04.04

Просмотров 1,7 тыс.7 месяцев назад

Create a DBRX-based Gen AI Agent in 20 minutes! 2024.04.04

38:09

AutoML on Databricks - 2024.03.15

Просмотров 3787 месяцев назад

AutoML on Databricks - 2024.03.15

Use Agent Studio to build a GenAI Agent in minutes!! 2024.03.11

41:06

Use Agent Studio to build a GenAI Agent in minutes!! 2024.03.11

Просмотров 1,1 тыс.8 месяцев назад

Use Agent Studio to build a GenAI Agent in minutes!! 2024.03.11

Introduction to Databricks Data Intelligence Platform in 2024! - 2024.03.05

27:19

Introduction to Databricks Data Intelligence Platform in 2024! - 2024.03.05

Просмотров 1,3 тыс.8 месяцев назад

Introduction to Databricks Data Intelligence Platform in 2024! - 2024.03.05

Azure Databricks Networking Security (Part 1) - 2024.02.02

53:47

Azure Databricks Networking Security (Part 1) - 2024.02.02

Просмотров 1,8 тыс.9 месяцев назад

Azure Databricks Networking Security (Part 1) - 2024.02.02

State Schema Evolution in PySpark using applyInPandasWithState - 2024.01.25

53:58

State Schema Evolution in PySpark using applyInPandasWithState - 2024.01.25

Просмотров 59210 месяцев назад

State Schema Evolution in PySpark using applyInPandasWithState - 2024.01.25

Deploying Scaleable Databricks Infrastructure with Terraform - 2024.01.24

20:25

Deploying Scaleable Databricks Infrastructure with Terraform - 2024.01.24

Просмотров 76610 месяцев назад

Deploying Scaleable Databricks Infrastructure with Terraform - 2024.01.24

Excel to Databricks - Getting to robust data insights in 15 minutes 2024.01.04

16:21

Excel to Databricks - Getting to robust data insights in 15 minutes 2024.01.04

Просмотров 40910 месяцев назад

Excel to Databricks - Getting to robust data insights in 15 minutes 2024.01.04

Managed Tables vs External Tables in Unity Catalog - 2023.11.03

6:39

Managed Tables vs External Tables in Unity Catalog - 2023.11.03

Просмотров 1,3 тыс.Год назад

Managed Tables vs External Tables in Unity Catalog - 2023.11.03

Lakehouse Federation - Querying data in other warehouses 2023.11.02

22:16

Lakehouse Federation - Querying data in other warehouses 2023.11.02

Просмотров 335Год назад

Lakehouse Federation - Querying data in other warehouses 2023.11.02

@Jahanjune 5 дней назад
Im not clear how to input the token into databricks_default conn. Plz be more specific. Thanks
@blaicil 26 дней назад
Thanks for sharing
@Vinn.V 27 дней назад
Thanks 😮
@Luchox5006 Месяц назад
Gracias por repetirlo en español!!
@stephanieamrivera 26 дней назад
Con gusto!
@binchentso 2 месяца назад
Great video. Any updates on the decision tree mentioned at 19:25?
@stephanieamrivera 22 дня назад
Not that I have heard
@lucasaraujodasilva8910 2 месяца назад
Hello Stephanie, thank you for sharing your knowledge. I have some questions about the VPC endpoints for Kinesis, S3, and STS, which were not addressed by JD Braun. These VPC endpoints are mentioned in the Databricks PrivateLink documentation. Are they necessary for seamless integration between AWS and Databricks? I appreciate your help.
@stephanieamrivera 22 дня назад
I think you will need it for Kinesis, but not S3
@tusharhatwar799 2 месяца назад
Can we get this notebook used in video?
@stephanieamrivera 22 дня назад
I don't have the notebook, but you can get the SQL statements used directly from docs -> for managed tables, check out: docs.databricks.com/en/tables/managed.html -> for external tables, check out (sample notebook at the bottom): docs.databricks.com/en/tables/external.html
@ravikumarkumashi7065 3 месяца назад
Thank you for the detailed vedio, is there any documentation to do the same from terraform. We know private endpoints are to be created and but how do enable firewall for an existing workspace which is created thru terraform ?
@stephanieamrivera 22 дня назад
This might be helpful ruclips.net/video/OgxQop9fB70/видео.htmlsi=xWlJapdaIpEEUulu
@allthingsdata 3 месяца назад
finally part 2 🙂
@coke8698 3 месяца назад
Nice explanation, thanks
@stephanieamrivera 3 месяца назад
Glad you liked it
@23sudheer 4 месяца назад
Is there a session has you mentioned with running demo with terraform to create complete environment? And just one quick question by using this code I just want to create only aws infrastructure and workspace but unity catalog should not be created can I do that ?
@stephanieamrivera 3 месяца назад
Why wouldn't you want Unity Catalog?
@23sudheer 3 месяца назад
@@stephanieamrivera bcoz already we are having the existing one unity catalog So I just want to connect with the existing one so I don't want to create the new one, one more time.
@wodfest 4 месяца назад
Is there a part 2 yet?
@stephanieamrivera 3 месяца назад
Yes! ruclips.net/video/FHYNpWRu_yc/видео.htmlsi=k_Bie8_LjJAfVevp
@vedakalluri 4 месяца назад
Cant wait for Part -2 now that serverless is GA
@stephanieamrivera 3 месяца назад
Here ruclips.net/video/FHYNpWRu_yc/видео.htmlsi=k_Bie8_LjJAfVevp :)
@ShrikanthP-je4ki 5 месяцев назад
This video is sufficient to make perfect design considering Power BI and Databricks !!! thanks a lot !!! I appreciate your details and crystal-clear explanations
@stephanieamrivera 3 месяца назад
Glad it was helpful!
@makportal 6 месяцев назад
This is a great video, thanks for sharing! Did part 2 ever get recorded?
@stephanieamrivera 3 месяца назад
Yes, ruclips.net/video/FHYNpWRu_yc/видео.htmlsi=k_Bie8_LjJAfVevp
@benjaminnewman3833 6 месяцев назад
is there a part 2, this is really helpful
@simonsayer8036 4 месяца назад
Is there a Part 2? Great video 🤘
@stephanieamrivera 3 месяца назад
ruclips.net/video/FHYNpWRu_yc/видео.htmlsi=k_Bie8_LjJAfVevp glad it was helpful!
@stephanieamrivera 3 месяца назад
ruclips.net/video/FHYNpWRu_yc/видео.htmlsi=k_Bie8_LjJAfVevp here you go!
@TusharHatwar-pp7uf 6 месяцев назад
Can we get slides use in this video?
@stephanieamrivera 22 дня назад
Sorry we don't have the slides to share
@tusharhatwar 6 месяцев назад
Where can i get the slides used in this video
@stephanieamrivera 22 дня назад
Sorry we don't have the slides to share
@damolaakinleye101 6 месяцев назад
Saw this on reddit the other day. Thanks for sharing the video. It would be lovely to see a spark query being tuned live with all of these feature. Performance tuning can be a little bit of a black art in Spark.
@stephanieamrivera 3 месяца назад
:)
@marz_nana 6 месяцев назад
Hi Stephanie, thanks for the video. i am currently using DLT with apply changes and write output to hive metastore, which has AWS glue connect with it. The output is a streaming table, however, it is actually a view build from __apply_changes_storage_xxx table. Any idea how this could be migrate from hive to UC? Also, when i change the same DLT pipeline target to a UC schema, it seems AWS glue is not able to get the table meta. Is there any documentation i can follow for DLT build table migrate from hive to UC? Thanks
@josebellido4676 6 месяцев назад
Nice demo!, In the video it is not clear how to access the UI in Databricks to create the agent. Can I get some help on this please?
@akashghadage5377 7 месяцев назад
Thanks for wonderful session.. one more on Checkpoint wrt cloudFile i.e Autoloader much needed .
@stephanieamrivera 3 месяца назад
Can you be more specific? I am happy to see if one of our experts has time to address. What exactly are you looking for?
@akashghadage5377 7 месяцев назад
Thanks for the the databricks skill builder series.
@SuhasYogish 7 месяцев назад
Love the video! Definitely gonna try this out. Could you help me understand how you start the UI builder shown during 26:20?
@allthingsdata 7 месяцев назад
Simply excellent. More of this!
@stephanieamrivera 7 месяцев назад
More to come!
@anirbandatta1498 7 месяцев назад
Hi, This is nicely created demo. Where can I get the notebooks please?
@stephanieamrivera 7 месяцев назад
I added the GitHub link to the description
@bajrangijha 7 месяцев назад
Hey Stephanie, I migrated everything from hive_metastore to unity just now, but when I'm executing my pipelines it's throwing class and library errors. I had the same libraries installed which were in the old clusters. In fact I edited the old cluster and changed the mode to "shared" in order to make it unity. The same libraries work fine in the old cluster. Do you happen to know what I'm missing here.
@stephanieamrivera 7 месяцев назад
Can you reach out to your account team? I don't know whats going on.
@bajrangijha 7 месяцев назад
@@stephanieamrivera Thanks for replying. The issue was resolved actually. In shared mode, it does not support some of the APIs and spark context according to the documentation. So we used single-user, multi-node cluster and it's all working fine. Thanks.
@aysegulcayiraydar1875 8 месяцев назад
If we scan the data via spn, should we define spn as an admin in dbx to get access token for it?
@stephanieamrivera 8 месяцев назад
To configure Service Principal Names (SPN) for accessing resources in Azure Databricks, you typically need to define the SPN as an admin in Databricks, but it isn't required to obtain an access token. Here's the typical flow to use an SPN to access data in Databricks: 1. Create an SPN: Firstly, you need to create an SPN in Azure Active Directory (AAD) and provide the necessary permissions for accessing the resources you require. 2. Assign Permissions: Assign the appropriate permissions to the SPN, such as the necessary roles or access policies to access the Databricks workspace and other resources. 3. Configure Databricks: As an admin in Databricks, you will configure the SPN by creating a secret scope and storing the SPN credentials securely. This can be done using the Databricks CLI or the Databricks UI. 4. Access Tokens: To obtain an access token for the SPN, you can use the Azure Active Directory authentication flow, such as OAuth2 or client credentials, to authenticate and generate the access token. This token will be used to authenticate the SPN when accessing the Databricks resources.
@AlexanderBishop-cadent 8 месяцев назад
Do you have a link to the ETL pipeline step by step process?
@stephanieamrivera 8 месяцев назад
Is this what you were looking for? databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/2070341989008551/3601578643761083/latest.html.
@iamaashishpatel 9 месяцев назад
What does the Customer VPC NACL resemble?
@stephanieamrivera 8 месяцев назад
The Customer VPC NACL is a security feature in AWS that functions as a virtual firewall for controlling inbound and outbound traffic at the subnet level. It resembles a set of rules that determine what traffic is allowed or denied in a VPC. Here are some key aspects and characteristics of the Customer VPC NACL: 1. Associations: A VPC NACL is associated with one or more subnets within a VPC. By default, each subnet in a VPC is associated with the default VPC NACL, but you can associate a custom NACL with your subnets. 2. Numbering: Each VPC NACL rule is assigned a rule number that determines the order in which rules are evaluated. Rule numbers can be either explicit (specified by the user) or implicit (automatically assigned by AWS). 3. Inbound and outbound rules: VPC NACLs have separate sets of rules for inbound and outbound traffic. Inbound rules control incoming traffic to the subnet, while outbound rules control outgoing traffic from the subnet. 4. Allow and deny rules: VPC NACLs can have rules that either allow or deny traffic. The rules are evaluated in order, and the first matching rule determines whether the traffic is allowed or denied. 5. Stateless: VPC NACLs are stateless, which means that responses to allowed inbound traffic are not automatically allowed outbound. Separate rules must be created for inbound and outbound traffic. 6. Default rules: By default, a VPC NACL allows all inbound and outbound traffic. You can modify the default rules to tighten security or create custom rules to fit your specific requirements. 7. Logging: VPC NACLs can be configured to log accepted and denied traffic, which helps in monitoring and analyzing network traffic patterns. It's important to note that the VPC NACL operates at the subnet level and provides a basic level of security. For more granular control and advanced security features, it is recommended to use Network Security Groups (NSGs) in conjunction with VPC NACLs. Reference: AWS Documentation on VPC NACLs - docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html. Hope this helps!
@jeromedupourque6067 9 месяцев назад
Thank you Arthur great video! could you tell me if it is possible to download your architecture diagrams somewhere? Thank you
@stephanieamrivera 8 месяцев назад
Unfortunately, RUclips doesn't let me upload files or images. It might be easier for you to take screenshots from the video. Sorry about that!
@rickrofe4382 9 месяцев назад
Thanks for this session it has been very useful , keep it going.
@KrisKoirala 9 месяцев назад
Since DLT displays counts on each box, is it usually slower than regular workflow? With the more enhanced features of Unity catalog came/coming in specifically, lineage and such we can easily see what (tables, views) is connected to where. Is it worth using DLT in the workflow if someone does not want to pay extra cost associated with it, considering that I will do the optimize and Z-ordering by my own in some frequency?
@stephanieamrivera 8 месяцев назад
The performance of DLT compared to a regular workflow depends on various factors, including the specific use case, data volume, query patterns, and optimization techniques used. While DLT introduces some overhead due to its real-time change data capture capabilities, the benefits it provides may still make it worth considering, even without utilizing all of its enhanced features. 1. Performance: DLT may have some additional processing overhead compared to regular workflows due to change tracking and maintaining transactional integrity. However, DLT's optimizations, such as indexing, caching, and predicate pushdown, can help mitigate this impact. If you leverage these optimizations effectively, the performance difference might be minimal. 2. Enhanced features: Unity catalog, with its lineage and other capabilities, can provide valuable insights into data connections and lineage. These features can enhance data understanding, data governance, and debugging processes. 3. Optimize and Z-ordering: Delta Lake provides various optimization techniques, including optimizing layout and improving data locality by leveraging Z-ordering. If you can incorporate these optimizations effectively into your regular workflow without using DLT, you can still achieve performance benefits without incurring the additional cost associated with DLT. In summary, using DLT depends on the specific requirements and constraints of your use case.
@maximilianschmitz8737 9 месяцев назад
It always says that the token isn't correct or has not the richt permissions. However, my PAT has admin permissions on that workspace. Have you had this issue?
@stephanieamrivera 8 месяцев назад
If you are experiencing issues with the personal access token (PAT) not being recognized or not having the right permissions, there are a few troubleshooting steps you can try: 1. Verify token permissions: Confirm that the PAT has the necessary permissions assigned within the Databricks workspace. Although you mentioned that the PAT has admin permissions, make sure it has the required permissions specifically for the actions you are trying to perform. For example, if you are accessing Delta tables, ensure that the PAT has the necessary permissions for table operations. 2. Check workspace configuration: Verify that token-based authentication is enabled in your Databricks workspace and that there are no restrictions or configurations that could prevent the use of tokens. Contact your workspace administrator to confirm the token settings and make sure there are no conflicts or restrictions. 3. Try with a new token: If all else fails, you can try revoking the existing PAT and generating a new one. Sometimes, there can be issues with specific tokens, so generating a fresh token may resolve the problem.
@AlanBorsato 9 месяцев назад
Thanks, Arthur
@damolaakinleye101 9 месяцев назад
Thanks for this. We need more of this ☺
@allthingsdata 9 месяцев назад
Excellent resource. I wish there was a longer session on Databricks Terraform with an e2e walkthrough but this is a good overview.
@ghyootceg 10 месяцев назад
Fantastic video ! Makes me wonder what should and shouldn't be built using Databricks SQL. As per my understanding, in this video, a balance is suggested between the gold layer and PowerBi.
@Pixelements 10 месяцев назад
Great Video, we are going to migrate from typical Data Warehouse to Lakehouse. Only thing that you did not mention (or I did not understand) is how to serve the Data for PowerBI Datasets (aka semantic models). In the Azure Data Warehouse world, we have a Technical User that refreshes the Dataset hourly or daily. But how do you refresh a dataset which is based on a Lakehouse? You youe the Databricks connector in PBI?
@stephanieamrivera 10 месяцев назад
I have asked Hobbs to reply :)
@BricksBI 10 месяцев назад
Hi @Pixelements. If you're using an Import approach, you will set a refresh schedule in the Power BI Service and your model will then refresh itself as often as the schedule dictates. If you're using DirectQuery, each time any given report is opened, it re-runs the query its based on and retrieves the results, so there's no need to set a refresh schedule there. You can also turn on a report setting in DirectQuery reports that says "once the report is open, go ahead and re-run your query every X minutes." In either case, your PBI Semantic Model (previously known as PBI Datasets) is using whatever connector you used when you made it to reach from the Power BI Service to Databricks and retrieve new data.
@pcp21599 10 месяцев назад
Great work, went through your playlist and its content is awesome. :)
@stephanieamrivera 10 месяцев назад
Much appreciated!
@arvind1cool 10 месяцев назад
Simply awesome. crystal clear
@rickrofe4382 10 месяцев назад
Nice demo, you want a job! Keep up the great work.
@stephanieamrivera 10 месяцев назад
Thanks! 👍
@lucaslira5 11 месяцев назад
Using it as a batch and merging it in forechBatch. Should I create the table with the delta location before processing? I say for tables arriving every day
@stephanieamrivera 10 месяцев назад
I asked Robert to reply :)
@robertmosley4577 10 месяцев назад
It depends on how much control you want. I have some customers that explicitly create every table before loading into it, but that's not necessary. You can create it ad-hoc at the time it's loaded. Chances are, many columns will be inferred as strings, so you may find that you want to specifically create the table before you begin loading into it.
@lucaslira5 10 месяцев назад
Thank you@@robertmosley4577
@lucaslira5 10 месяцев назад
Thank you@@stephanieamrivera
@majdi_saadani 11 месяцев назад
Hello Stephanie, Thank you for the video, it is interesting to see how we can include airflow in databricks and manage jobs externally. My question is: is there a benefits to use airflow in databricks to schedule jobs instead of using workflows directly in databricks UI?
@stephanieamrivera 10 месяцев назад
Thanks for the question. Not really. I see customers use workflows unless their company already uses airflow.
@maksymbelko Год назад
Awesome, glad to find so useful content before migration
@stephanieamrivera 10 месяцев назад
Happy its helpful!
@allthingsdata Год назад
Excellent session and material. Thanks a lot!
@stephanieamrivera 10 месяцев назад
Happy its helpful!
@TaherSailana Год назад
can you provide all connectivity of vpc as such which subnets is connected to which route table and to which endpoint?
@stephanieamrivera 9 месяцев назад
Will get back to you shortly on this!
@stephanieamrivera 9 месяцев назад
The connectivity of the VPC will depend from deployment to deployment. In this case, there is a private subnets with route tables to an S3 gateway endpoint, the local VPC CIDR, and 0.0.0.0/0 to a NAT gateway. The public subnet then routes all traffic to an internet gateway. The traffic from the EC2 instance to the PrivateLink endpoint for Databricks is covered in the local VPC CIDR range route table entry. The traffic finds it's way to the endpoint using DNS resolution. Hope this helps!
@OPopoola 5 месяцев назад
@@stephanieamrivera It doesn't. The training is vague in some areas. What I would like to see is is an explanation of: (1) sample NACL for the traffic into the subnet. (2) sample security group that can be attached to the cross account IAM role that would work for private link purposes (3) configuring access for access to s3 for example and other IGW public services. I have been battling with this for 4 days now. I need a resolution asap. Documentation is taking me all over the place. There are members of my team that are calling for other platforms but I am adamant that this is the best platform for our purposes. Thanks.
@stephaniedatabricksrivera 4 месяца назад
@@OPopoola Please reach out to your Databricks team for more details. NACLs remain standard, they should be unchanged. This is standard AWS networking. S3 Gateway Endpoint for in-region buckets, NAT Gateway to Internet Gateway for public services, with a WAF if needed.
@TimFrazer-xe9dd Год назад
Thanks for sharing. When creating an external connection to Azure SQL, what authentication methods are supported? Can we use a Service Principal or is it limited to SQL Authentication?
@stephanieamrivera 9 месяцев назад
There are 2 different authentication methods that can be used: 1. SQL Authentication: This method involves providing a username and password to authenticate against the Azure SQL Server. It requires a login and password that are configured on the Azure SQL Server. 2. Azure Active Directory (AD) authentication: Azure SQL supports using Azure AD identities to authenticate and authorize database access. This method enables you to use Azure AD accounts or groups to authenticate and manage access to your Azure SQL Database or Azure Synapse Analytics. Azure SQL currently does not directly support authenticating with Service Principals. However, you can use Azure AD credentials associated with a Service Principal to authenticate and access Azure SQL by creating a SQL login mapped to the Service Principal's Azure AD identity.
@vonmoraes Год назад
There is some video of conecting and using this in dataflow? i mean an hands on video haha
@stephanieamrivera Год назад
You mean connecting Databricks to dataflow?
@bolbol8043 Год назад
Thanks for sharing. Can I access the logs from databricks directly to cloudwatch/trial without a s3 bucket?
@stephanieamrivera Год назад
I have asked JD to respond :)
@bolbol8043 Год назад
@@stephanieamrivera thanks

Stephanie Rivera

Видео

Комментарии