![Knowledge Sharing](/img/default-banner.jpg)
- Видео 51
- Просмотров 208 578
Knowledge Sharing
США
Добавлен 22 сен 2019
I am passionate about learning new things every day and also sharing my knowledge with others. Thanks to youtube for helping me to share my knowledge.
DAIS Summary
I had a chance to attend Databricks data and AI summit this year and this video gives you very high level of the key features released as part of the event.
Просмотров: 23
Видео
Databricks data solution architecture
Просмотров 2705 месяцев назад
this video covers simple data solution architecture covering databricks services in azure cloud
Unity catalog Part 5 - end to end data engineering process
Просмотров 1105 месяцев назад
this video covers end to end data engineering process with simple example using the databricks cluster in which unity catalog is enabled. additionally it covers the use of data lineage
Unity Catalog part 4 - Demo ( Catalog, Schema and tables)
Просмотров 1618 месяцев назад
this video provides the demo on creating the unity catalog, schema and tables. Please go through the video and provide your valuable comments below.
Unity Catalog Part 3 - Necessary details to create Unity Catalog
Просмотров 1289 месяцев назад
this video provides the necessary details required to create unity catalog, such as storage, databricks workspace, databricks connector and metastore. cluster configuration - ruclips.net/video/l5xhnkjnZtk/видео.html databricks clone - ruclips.net/video/2AHaQfBF1Eg/видео.html unity catalog part 1 - ruclips.net/video/27u_fwvVD0w/видео.html unity catalog part 2 - ruclips.net/video/t0fyrdUzdzo/виде...
Databricks - Single sign on and access the Azure data lake
Просмотров 1699 месяцев назад
This video covers the use of credential pass through in databricks. integrating active directory.
Azure Databricks Unity Catalog Part 2 - Identity management and admin roles
Просмотров 2119 месяцев назад
this video is continuation to the part 1 and covers identity management, admin roles required and also the data permissions in unity catalog. please go through the 1st video before this for better understanding. ruclips.net/video/27u_fwvVD0w/видео.html
Azure Databricks Unity Catalog Part 1 - Introduction
Просмотров 39110 месяцев назад
this is the first part of Unity Catalog. Provides high level idea about unity catalog, its benefit and object model.
databricks cluster configurations Part 1
Просмотров 89211 месяцев назад
this video gives basic information related to databricks cluster configurations.
Custom Logging in Databricks
Просмотров 2,6 тыс.Год назад
logging is one of the important activity required in the application programming. this video provides you the basic details and the usage of logging in databricks. here is the published version of the notebook - databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2683250055805286/3363163823230601/3892374572023226/latest.html
Write Pyspark data frame Into Excel
Просмотров 2,4 тыс.Год назад
this video gives the idea of writing pyspark dataframe into excel file. Please go through this video (ruclips.net/video/1RFaQb0Eew8/видео.html) to understand how to read it from excel using pyspark. program file uploaded here - databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2683250055805286/81200766412462/3892374572023226/latest.html
Delta Table - Clone
Просмотров 689Год назад
This video provides fair idea of cloning the delta table which is very useful in certain aspects. here is the link of the notebook that I have used during my demo databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2683250055805286/3492241143357701/3892374572023226/latest.html.
Delta Table Transaction Log Part 2
Просмотров 438Год назад
this is the continuation of the part 1 and provides in depth details of transaction log along with the demo. please go through part 1 before this video - previous video - ruclips.net/video/wC9Aj-HznPg/видео.html Notebook path - databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2683250055805286/451599583820585/3892374572023226/latest.html
Delta Table Transaction Log Part 1
Просмотров 1,3 тыс.Год назад
This transaction log is key to understanding Delta Lake because it is the underlying infra‐ structure for many of its most important features like ACID transactions, scalable metadata handling, time travel etc. So let's deep dive into transaction log. this part 1 video covers implementation of atomicity in delta tables.
File Size Calculation using pyspark
Просмотров 1,7 тыс.Год назад
this video gives the details of the program that calculates the size of the file in the storage.
connect databricks delta table from Power BI
Просмотров 9 тыс.Год назад
connect databricks delta table from Power BI
pandas frame to databricks pyspark dataframe
Просмотров 2,2 тыс.Год назад
pandas frame to databricks pyspark dataframe
Databricks convert Delta to Parquet and vice versa
Просмотров 3,8 тыс.2 года назад
Databricks convert Delta to Parquet and vice versa
Read from excel file using Databricks
Просмотров 15 тыс.2 года назад
Read from excel file using Databricks
null handling by replacing with column value in another dataframe
Просмотров 2,2 тыс.3 года назад
null handling by replacing with column value in another dataframe
Databricks data frame Manipulation subtract
Просмотров 6713 года назад
Databricks data frame Manipulation subtract
Implementing SCD Type 2 using Delta
Просмотров 19 тыс.3 года назад
Implementing SCD Type 2 using Delta
really helpful, thank you for sharing
So far off! Scd type 2 requires a unique surrogate key to join with a fact FK!
Is this hue
Just finding your channel today. You are an AWSEOME teacher, presenter, and practitioner. Thanks much for sharing your knowledge!
Hey, Try to run merge query again and again. It will insert the records into Dim Table.. Beacuse of joinkey considering as null always EmoloyeeId from target not matched with null and keep on inserting records
Can we use this method to read excel file by placing files in gen2 and reading excel files using pyspark. Since iam not able to do same from storage account. Pls reply
yes, you do this by uploading into the storage gen 2
Hi Bro How can I connect azure data studio from databricks and databricks to data lake then datalake to snowflake can you help me
can I know the reason to connect to azure data studio from databricks? I didnt try this method as I dont have any use case
Superb explanation 👌 👏 👍
Glad you liked it
How can I import a notebook along with visualization .actually I have created a notebook and visualization with the results and now I want to migrate them in prod
best approach is to use github
The cmd 4 did not work. I have the excel in Microsoft Azure storage
I think you should re-title this video as "Databricks credential pass through". This was specifically what I was seeking for and almost did not click on it because I did not think it was Databricks focused. ....just a thought. Thanks
Sure. Thanks for the suggestions
if there is no change in source data and we try to run the merge code again as part of daily run then the mergeKey null records will be inserted again into target column as active and we will be ending with duplicates , how to solve it ?
there should not be null values in the key columns. please handle nulls before the insertion
Great Explaination. How do we decide which worker and driver type is to be selected. And how many instances of workers are to be used. Are there any set of rules or calculations to decide??
It should be based on the work load. Normally we will not do any work on the driver unless the user using data science codes using pandas. If you add multiple nodes, then your parallelism increase. Again please note that if the high volume data processing required from the beginning, then you can add more capacity to the nodes. It requires separate session to explain. Let me add video
I need you’re help, possible to connect with you over call
is there a way to connect databricks from Oracle SQL Developer ?
Didn’t try that. Should be there
this mathod not working in synapse notebook .
oh ok. didnt try in synapse one. will try and let you know
Hi sir, I am facing connectivity issue from power bi to Azure databricks. This is the error: Details: "ODBC : ERROR [HY000] [ Microsoft][ThriftExtension] (14) Unexpected response from server during a HTTP Connection: SSL_Connect: Certificate verify failed.". Can you please help me in above issue.
how are you connecting. is this your organization laptop or personal one. if it is your office laptop, work with your network team.
Is this issue resolved andi
can you please write scd type 2 code in generic way currently you have written it only for the one column please and thank you.
yes, this is an example. please let me know your requirement in detail.
It helpful. Thanks
can you share the file you have uploaded here the csv file
databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2683250055805286/1609000298248664/3892374572023226/latest.html
what if othet column updated apart from address
use the columns that you need to consider for scd type 2. this is just an example
How did you get your dashboard to look like that? its not letting me write code in the dashboard
great but try to mention your linkedin try to share your notebook link on a repo so that we can get the code
Hi, can global view directly be accessible by powerBI?
Hi watched this video, it is really helpful one quick question. Once we shutdown logging the file is written to storage (ADLS in my case). After that i am unable to write data to same file throws an error. Can you please help with that
Can you please share your code
Is there a reason why you shouldn't do this? I am surprised this isn't encouraged as a best practice which makes me think I'm missing something
If the data volume is high, it may affect your program performance. ADF will be your best choice for such scenarios
can we read pivot Excel connected to azure analysis services uding this method?
Hi can make video about auto loader and structured streaming
will create one for auto loader. please watch this video for streaming ruclips.net/video/WYSa2dUALAc/видео.html
Thanks Jithesh for this video. Really helpfull
Hi sir , what if i want to fill the null columns in salary with the average of preceding and successive values ?
and if there are continuous null values then first populate the average for the first null values with the average and then .. with that updated value and the next successive value calculate the average for the 2nd null value
@@Suriya_MSM I think I am not clear. Can you please paste the example
Thank you for this content
Hi, excelent video. I have a question, is there a way to schedule an email sending with the dashboard information? For example to receive everyday an email with a pdf or a link in which I can see the dashboard with its information updated.
I didnt try this option. will try and let you know
Hi how do we troubeshoot spark driver error no parent missing and null pointer exception
You can go to the driver logs and dig deep
import pandas as pd df=pd.read_excel("filename.xlsx")
Thank you so much for the realtime explanation of shallow vs deep clone - I have been searching for it - It's a great explanation!
this is top notch content!! Excellent!!
Hi..Thanks for the content! I am converting the pandas df to spark dataframe in databricks but getting an error cannot infer schema, I have used the parameter inferschema=True,.The pyspark version is 3.0 ...Can you please help me with this
can you please share your code
Fantastic video, example and explanation. Thanks!
when I use direct query, query folding doesn't happen so it tries to import everything which can't happen because database is too large... how can I solve this?
how much is the data size
@@KnowledgeSharingjkb 33billion rows
I believe it is the power bi issue as it has size restrictions
Excellent topic and well explained
Thank you for the video. How I can create a function based on this example ? For example I have 100 columns in DataFrame1 and 100 columns in DataFrame2 now I want to replace null values in DataFrame1 with DataFrame2. Note: Both DataFrame1 and DataFrame2 have same column names. Thanks in advance!!
are you thinking to create function that accepts the columns as parameter and then replace the values?
Nice video. clearly explained I have a blocker. While running the dbutils.fs.mount(), I'm getting the below error: Unsupported Azure Scheme: abfss
can you please explain how to write the data to xl sheet from pyspark dataframe
please see this video ruclips.net/video/Auvft3B5tlk/видео.html
i have encounter an ssl issues could you help?
Please let me know your issue
Is it only me or someone else have the same question, like while creating shareanalysis table first line is Drop table if exists, the How come data can be there unless we run the insert command?
can you please elaborate it
Thank you so much...today I learn new concept I will add it into my resume.
how can i get the dataset for this example
It is available in Yahoo finance. You download it from there. Let me check I can add it here
Sir is it possible in community databricks?
I didn’t try this honestly but should work
Thank you, very well explained. I have an important question, as I saw in the Internet, the users have to pay charges for terminated clusters despite not being running, my question is if there is any way to delete the cluster once the execution is done, this way you can safe money, because I have to schedule a job to be done every day in a year, for example. Thank you.
there is no charge to you if the cluster is inactive. we can also programmatically delete the cluster
Thank you very much