Row Format vs Column Format | Why Parquet is better than Avro | Why Columnar formats are preferred

Поделиться
HTML-код
  • Опубликовано: 12 ноя 2022
  • Learn more at www.scholarnest.com/
    Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
    ========================================================
    SPARK COURSES
    -----------------------------
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/d...
    KAFKA COURSES
    --------------------------------
    www.scholarnest.com/courses/a...
    www.scholarnest.com/courses/k...
    www.scholarnest.com/courses/s...
    AWS CLOUD
    ------------------------
    www.scholarnest.com/courses/a...
    www.scholarnest.com/courses/a...
    PYTHON
    ------------------
    www.scholarnest.com/courses/p...
    ========================================
    We are also available on Udemy Platform
    Check out the below link for our Courses on Udemy
    www.learningjournal.guru/cour...
    =======================================
    You can also find us on Oreilly Learning
    www.oreilly.com/library/view/...
    www.oreilly.com/videos/apache...
    www.oreilly.com/videos/kafka-...
    ==============================
    Follow us on Social Media
    / scholarnest
    / scholarnesttechnologies
    / scholarnest
    / scholarnest
    github.com/ScholarNest
    github.com/learningJournal/
    ========================================
  • НаукаНаука

Комментарии • 7

  • @MrSravan84
    @MrSravan84 Год назад +1

    Very nicely explained. But @8:40 you mentioned that the column 2 can go in the different same block or different block and @11:29 you mentioned that Spark knows that column 2 is stored in Block-2. These 2 statements are sort of causing confusion. i.e., if a column of each row can be spread across multiple blocks how does Spark know which block to search ?

  • @cheluveshab9525
    @cheluveshab9525 Год назад +2

    Pleasure do make a video on compression techniques

  • @nindersingh
    @nindersingh Год назад +1

    In Block 1 R3C3 is mentioned as wrong 🚫, this must be R2C3. Because R3C3 is coming in Block 2 as expected.

  • @sumanthb3280
    @sumanthb3280 Год назад +1

    So, why is Avro used in some projects?

    • @sumitnekar8965
      @sumitnekar8965 Год назад +1

      One scenario i can think of,Avro over plain json offers benefits like schema evolution which can be beneficial in case of multiple producers and consumers setup. If you are using json data format with kafka topics in a data pipeline, avro format can be leveraged instead of json.

    • @josephjoestar995
      @josephjoestar995 6 месяцев назад

      @@sumitnekar8965could you explain further please? I’m doing some investigation work on choosing avro v parquet v delta tables for Azure Event Hubs output, your explanation would be appreciated 🙏

  • @user-gh4lv2ub2j
    @user-gh4lv2ub2j 8 месяцев назад

    As a mathematician I must inform you that having a row space vs a column space is an isomorphism. There is no difference; it's in your head.