60. Databricks & Pyspark: Delta Lake Audit Log Table with Operation Metrics

Поделиться
HTML-код
  • Опубликовано: 31 дек 2024

Комментарии • 40

  • @mohitupadhayay1439
    @mohitupadhayay1439 7 месяцев назад +3

    I was having this requirement right now and the video popped up just at the right time.
    Thank you Raja.
    Databricks with DE won't be easy to learn if you weren't there.

  • @ruinmaster5039
    @ruinmaster5039 Год назад +1

    The best explanation for such imp problem!
    Please do this kind of videos as much as possible

  • @blacknwhitenblue
    @blacknwhitenblue Год назад +1

    very good content, concise and clear

  • @seasql
    @seasql 2 года назад +2

    what about parallel transactions.. how we can catch the latest records(history(1) if table insert/delete/update from multiple sessions ... so we cannot ensure that history(1) returns latest statistics...

  • @vipinkumarjha5587
    @vipinkumarjha5587 3 года назад +1

    This is superb... very useful

  • @mohitupadhayay1439
    @mohitupadhayay1439 7 месяцев назад

    Where did the path to the delta_merge came from?

  • @purnimasharma9734
    @purnimasharma9734 2 года назад

    The video is awesome. Thanks for sharing! Can you provide the notebook please?

  • @sravankumar1767
    @sravankumar1767 3 года назад +1

    Superb bro👌👍

  • @rajunaik8803
    @rajunaik8803 Год назад +1

    Hi Raja, this is great, but just a quick question - in your case that is history (1) will always give you one latest record irrespective of operation performed on delta table. which is not correct right?
    Is there any way we can skip reading record from history in case of no operation on delta table?

    • @rajasdataengineering7585
      @rajasdataengineering7585  Год назад

      Hi Raju, good question.
      Even table creation is considered as one of the operation. So we will always have at least one operation for any table

    • @rajunaik8803
      @rajunaik8803 Год назад +1

      @@rajasdataengineering7585 Thanks Raja for reply, I was wondering, let's say my notebook is keep performing SCD Type 1 and keep inserting audit data into audit log table.
      For any given run it has not produced any operation on delta table (no insert, no delete, no update nothing). But still my Audit logic will read latest record from history (this was already inserted into audit log as part of previous run) and create duplicate in audit log table, right?
      Hope I am making sense here :)

    • @rajasdataengineering7585
      @rajasdataengineering7585  Год назад +1

      The only scenario when no insert/update/delete performed is when the source dataframe is empty. Which means there is no input file to your databricks pipeline.
      When there is no input data, what is the need to run the pipeline?

    • @rajunaik8803
      @rajunaik8803 Год назад

      @@rajasdataengineering7585 Understood. Thanks Raja !!

  • @subbua4331
    @subbua4331 2 года назад +1

    Thanks very helpful, is there a way to enable audit on delta table read (for example who selected which columns from a delta table)

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 года назад +1

      Hi Subbu, as far as I know that option is not yet available. But need to check to confirm...will check and let you know

    • @subbua4331
      @subbua4331 2 года назад +1

      @@rajasdataengineering7585 Thanks Raj!

  • @ranjansrivastava9256
    @ranjansrivastava9256 Год назад +1

    After the merge operation , are you inserting this metrics in the audit_log table or it will capture automatically after merge/insert/delete operations in the audit_log. Please share your thoughts.

  • @krishnamurthy9720
    @krishnamurthy9720 3 года назад +1

    Thanks for video..

  • @manigandang6921
    @manigandang6921 Год назад

    Hi bro, nice but one doubt how did you insert those values into audit table?

  • @SurajKumar-hb7oc
    @SurajKumar-hb7oc Год назад

    Hii
    Where you create the table audit_log and how the table show that data?

  • @krishnamurthy9720
    @krishnamurthy9720 3 года назад +1

    Raj, where this Delta table definition resides?

    • @rajasdataengineering7585
      @rajasdataengineering7585  3 года назад

      Hi Krishna, to get the DDL script of delta table you can use below command
      %sql
      show create table

    • @krishnamurthy9720
      @krishnamurthy9720 3 года назад +1

      @@rajasdataengineering7585 will it show us the location of Delta table?

    • @rajasdataengineering7585
      @rajasdataengineering7585  3 года назад

      Yes it gives location as well in the create ddl script

    • @Learn2Share786
      @Learn2Share786 3 года назад

      @@rajasdataengineering7585 does that mean delta table schema metadata is stored on databricks i.e. dbfs ?

  • @sravankumar1767
    @sravankumar1767 3 года назад +1

    Can you please start the real time scenarios in pyspark

    • @rajasdataengineering7585
      @rajasdataengineering7585  3 года назад +1

      Sure Sravan. If you are looking for any particular real time scenario, please share that scenario with me and I will create a video. Thank you

    • @sravankumar1767
      @sravankumar1767 3 года назад +3

      @@rajasdataengineering7585 am actually attending the interviews, actually I don't have experience on ADB just only I have Adf without adb they are not considered. Completly they are asking ADB very depth.just seeing your videos giving answers. When they are asking real time scenarios am confused

    • @rajasdataengineering7585
      @rajasdataengineering7585  3 года назад +1

      Will help you with real time scenarios.

    • @prabhatgupta6415
      @prabhatgupta6415 Год назад

      Yes Raja Sir. Please@@rajasdataengineering7585

  • @shwetac2929
    @shwetac2929 Год назад

    I want this notebook can u share this with us

  • @seshaiahambati1798
    @seshaiahambati1798 8 месяцев назад

    My i have Delta hist explode function video

  • @niteshsoni2282
    @niteshsoni2282 Год назад +2

    bro change the BGM plzzz...very irritating