I was having this requirement right now and the video popped up just at the right time. Thank you Raja. Databricks with DE won't be easy to learn if you weren't there.
what about parallel transactions.. how we can catch the latest records(history(1) if table insert/delete/update from multiple sessions ... so we cannot ensure that history(1) returns latest statistics...
Hi Raja, this is great, but just a quick question - in your case that is history (1) will always give you one latest record irrespective of operation performed on delta table. which is not correct right? Is there any way we can skip reading record from history in case of no operation on delta table?
@@rajasdataengineering7585 Thanks Raja for reply, I was wondering, let's say my notebook is keep performing SCD Type 1 and keep inserting audit data into audit log table. For any given run it has not produced any operation on delta table (no insert, no delete, no update nothing). But still my Audit logic will read latest record from history (this was already inserted into audit log as part of previous run) and create duplicate in audit log table, right? Hope I am making sense here :)
The only scenario when no insert/update/delete performed is when the source dataframe is empty. Which means there is no input file to your databricks pipeline. When there is no input data, what is the need to run the pipeline?
After the merge operation , are you inserting this metrics in the audit_log table or it will capture automatically after merge/insert/delete operations in the audit_log. Please share your thoughts.
@@rajasdataengineering7585 am actually attending the interviews, actually I don't have experience on ADB just only I have Adf without adb they are not considered. Completly they are asking ADB very depth.just seeing your videos giving answers. When they are asking real time scenarios am confused
I was having this requirement right now and the video popped up just at the right time.
Thank you Raja.
Databricks with DE won't be easy to learn if you weren't there.
Thank you! Glad it helps
The best explanation for such imp problem!
Please do this kind of videos as much as possible
Thanks for your comment. Sure will create more videos on such scenarios
very good content, concise and clear
Thanks
what about parallel transactions.. how we can catch the latest records(history(1) if table insert/delete/update from multiple sessions ... so we cannot ensure that history(1) returns latest statistics...
This is superb... very useful
Thanks Vipin
Where did the path to the delta_merge came from?
The video is awesome. Thanks for sharing! Can you provide the notebook please?
Superb bro👌👍
Hi Raja, this is great, but just a quick question - in your case that is history (1) will always give you one latest record irrespective of operation performed on delta table. which is not correct right?
Is there any way we can skip reading record from history in case of no operation on delta table?
Hi Raju, good question.
Even table creation is considered as one of the operation. So we will always have at least one operation for any table
@@rajasdataengineering7585 Thanks Raja for reply, I was wondering, let's say my notebook is keep performing SCD Type 1 and keep inserting audit data into audit log table.
For any given run it has not produced any operation on delta table (no insert, no delete, no update nothing). But still my Audit logic will read latest record from history (this was already inserted into audit log as part of previous run) and create duplicate in audit log table, right?
Hope I am making sense here :)
The only scenario when no insert/update/delete performed is when the source dataframe is empty. Which means there is no input file to your databricks pipeline.
When there is no input data, what is the need to run the pipeline?
@@rajasdataengineering7585 Understood. Thanks Raja !!
Thanks very helpful, is there a way to enable audit on delta table read (for example who selected which columns from a delta table)
Hi Subbu, as far as I know that option is not yet available. But need to check to confirm...will check and let you know
@@rajasdataengineering7585 Thanks Raj!
After the merge operation , are you inserting this metrics in the audit_log table or it will capture automatically after merge/insert/delete operations in the audit_log. Please share your thoughts.
Same question ?
Thanks for video..
Welcome
Hi bro, nice but one doubt how did you insert those values into audit table?
Hii
Where you create the table audit_log and how the table show that data?
Raj, where this Delta table definition resides?
Hi Krishna, to get the DDL script of delta table you can use below command
%sql
show create table
@@rajasdataengineering7585 will it show us the location of Delta table?
Yes it gives location as well in the create ddl script
@@rajasdataengineering7585 does that mean delta table schema metadata is stored on databricks i.e. dbfs ?
Can you please start the real time scenarios in pyspark
Sure Sravan. If you are looking for any particular real time scenario, please share that scenario with me and I will create a video. Thank you
@@rajasdataengineering7585 am actually attending the interviews, actually I don't have experience on ADB just only I have Adf without adb they are not considered. Completly they are asking ADB very depth.just seeing your videos giving answers. When they are asking real time scenarios am confused
Will help you with real time scenarios.
Yes Raja Sir. Please@@rajasdataengineering7585
I want this notebook can u share this with us
My i have Delta hist explode function video
bro change the BGM plzzz...very irritating
Sure bro