Reading and Writing Parquet Files using Spark | Compression | Advantages | Predicate Pushdown

Поделиться
HTML-код
  • Опубликовано: 2 фев 2025

Комментарии • 2

  • @vetiarvind
    @vetiarvind 5 дней назад

    Nice tutorial again, can you show the contents of the parquet file.

    • @atomicengineering
      @atomicengineering  4 дня назад

      Thankyou for watching the video and providing the feedback.
      Parquet files are column-oriented and cannot be opened in text editors because they are stored in a binary format( stored by column rather than by row and are compressed by default), not a human-readable text format.
      For example, if you tried to open a Parquet file in a text editor, you might see something like the below line:
      PAR1����...x�}�...♦☺◄
      To actually read the data, you need specialized tools like:
      Python libraries (pandas, pyarrow)
      Command-line tools (parquet-tools)
      Systems that support Parquet (like Apache Spark)
      That's exactly what we did in this video :)
      But I have shown the data in the source parquet files at time 5:33
      I hope this helps.. Cheers 🤝