An introduction to Apache Parquet

Поделиться
HTML-код
  • Опубликовано: 13 окт 2022
  • In this video, we learn all about Apache Parquet, a column-based file format that's popular in the Hadoop/Spark ecosystem. We use pyarrow and parquet-cli to make sense of some Parquet files from the NYC Taxis dataset.
    Resources:
    Apache Parquet - parquet.apache.org/
    The Parquet format specification - github.com/apache/parquet-format
    Apache Arrow - arrow.apache.org/
    Pandas commands for exporting DataFrames - pandas.pydata.org/docs/refere...
    Parquet CLI - pypi.org/project/parquet-cli/
    NYC Taxis Dataset - www1.nyc.gov/site/tlc/about/t...
    GitHub Gist with code - gist.github.com/mneedham/1118...

Комментарии • 34

  • @HashemAlDhaheri
    @HashemAlDhaheri 20 дней назад

    Thanks Mark. You're not only explaining the usefulness of using Parquet in about 5m, but giving us extra tips on using some very useful tools as well.

  • @evefreeman2457
    @evefreeman2457 Год назад +43

    Mark, I have to say, you are one of the few youtube people I don't feel like I need to set 1.5 or 2x speed on to watch. :) Thanks for the videos!

    • @learndatawithmark
      @learndatawithmark  Год назад +4

      I think that's the best compliment I could ever get ! Thanks :)

  • @abhishekprakash4793
    @abhishekprakash4793 5 дней назад

    Thanks, Mark, for this awesome video and this is very informative

  • @ingenieroriquelmecagardomo4067
    @ingenieroriquelmecagardomo4067 10 дней назад

    Good video, mark.

  • @EmergingStar142
    @EmergingStar142 4 месяца назад +2

    Succinct and to the point. Just what you need to go from "dumb" to "dangerous." Thanks!

  • @justinxia9293
    @justinxia9293 8 месяцев назад +1

    Very great video, achieved something that numerous cloud providers couldn't achieve with their lengthy documentations within 5 minutes.

  • @madelineDaMiddle1
    @madelineDaMiddle1 Год назад +2

    Thank you for edification that doesn't waste time. Well done, Sir!

  • @_truthful_q_
    @_truthful_q_ Год назад +8

    Wow, succinct, to the point, and above all useful!

  • @snoreking7729
    @snoreking7729 Год назад

    this was so easy to understand thanks mark

  • @kutra100
    @kutra100 7 месяцев назад +1

    perfectly explained.. didn't need too much verbosity and you did a great job

  • @SunggukLim
    @SunggukLim 3 месяца назад

    Definitely useful. Love it.

  • @jayong2370
    @jayong2370 Год назад

    Wow! big space saving.

  • @SodaPy_dot_com
    @SodaPy_dot_com Месяц назад

    awesom😂e

  • @TheSlurton
    @TheSlurton Год назад +1

    This was like drinking from a fire hose. I liked it but would like to see a much more detailed video where you download the file describe it and then go over the tools needed to make use of the file.

    • @learndatawithmark
      @learndatawithmark  Год назад +1

      Hi Scott, Thanks for your comments. Have you checked out some of the other videos in the Parquet play list? I've gone into a bit more detail in some of those ones.

  • @user-nt9sw8fw7d
    @user-nt9sw8fw7d 10 месяцев назад

    Hi Mark, great video. Could you pls cover Parquet Modular encryption topic also.

  • @sidilekhalifa7320
    @sidilekhalifa7320 Год назад

    Hello, I have many .parquet files of the same type and I would like to display these files as a 'select * from ...many_file.parquet', how can I do this with parq please?

  • @pukyalligator
    @pukyalligator 5 месяцев назад +1

    Fast like Ferrari

  • @alx8439
    @alx8439 11 месяцев назад

    Every binary format will be more size effective than CSV, YAML, JSON or god forgive me XML. Just because the latter are very bloated when it comes to size.

  • @coopernik
    @coopernik Год назад +1

    Good video but u need to slow down your speech mate

    • @justinxia9293
      @justinxia9293 8 месяцев назад

      I found the speed fine, compact enough to not make me slack off. Maybe try pausing when he's showing the code section to read through the logged parts on the terminal?

  • @TheHitessh
    @TheHitessh 9 месяцев назад

    Why are you so fast? It's overwhelming for the people who doesn't know a specific topic.

    • @learndatawithmark
      @learndatawithmark  9 месяцев назад

      Sorry! It's not intentionally fast - that's just the pace that I speak at! Is there any bit that didn't make sense that I can try to explain more?

    • @justinxia9293
      @justinxia9293 8 месяцев назад

      You always have the option to pause and playback at 0.75x. Remember you're the person learning so it's more important that you find a way to customize to your learning habits.

  • @williamknox8438
    @williamknox8438 27 дней назад

    🫤 this would have been more helpful if you'd had significantly less caffeine.

    • @learndatawithmark
      @learndatawithmark  27 дней назад +1

      I don't actually drink caffeine, but I know what you mean! I've been told I speak too quickly since forever, but I find it so difficult to slow down!

    • @DrLengen
      @DrLengen 21 день назад

      Slow the video speed down