[100% Interview Question] Cache and Persist in Spark

Поделиться
HTML-код
  • Опубликовано: 12 дек 2024

Комментарии • 9

  • @balajichandramohan9707
    @balajichandramohan9707 7 месяцев назад +1

    Hi sir, this is topic related to Oracle.

  • @DS-bo5wu
    @DS-bo5wu 4 года назад +2

    Hi Ankush, Thanks for the video, I have one query. suppose if I am using Persist(StorageLevel.DISK_ONLY), then how will it improve Spark application performance because if this application will need this data again then it will have to read from DISK only, so there will be more I/O operations with the disks and as we all know spark doesn't do unnecessary I/O operations with the disks and it is the main reason why Spark is better than MapReduce.

    • @learnomate
      @learnomate  4 года назад +2

      Simple example - you may have one relatively great RDD rdd1 and one smalled RDD rdd2. You want to store both of them.
      If you apply persist MEMORY_AND_DISK on both, then both of them will be spilled to disk resulting in slower reaed.
      But you may take a different approach - you may store rdd1 with DISK_ONLY. It may just so happen that thanks to this move you can store rdd2 right in the memory with cache() option and you will be able to read it faster.

    • @DS-bo5wu
      @DS-bo5wu 4 года назад

      @@learnomate Thanks for the clarification

  • @mani.kandan4020
    @mani.kandan4020 4 года назад

    Nice video bro ....... I'm from tamil nadu

  • @pardeep657
    @pardeep657 4 года назад +1

    Hi Ankush, how long the cached data will survive in memory, does it automatically gets removed when the session ends?

    • @Ady_Sr
      @Ady_Sr 2 года назад

      yes it does if you dont un cache it manually

  • @rohinidhorje8269
    @rohinidhorje8269 Год назад

    Aws step function

  • @mani.kandan4020
    @mani.kandan4020 4 года назад

    Make hbase video bro