An introduction to Apache Parquet
HTML-код
- Опубликовано: 13 окт 2022
- In this video, we learn all about Apache Parquet, a column-based file format that's popular in the Hadoop/Spark ecosystem. We use pyarrow and parquet-cli to make sense of some Parquet files from the NYC Taxis dataset.
Resources:
Apache Parquet - parquet.apache.org/
The Parquet format specification - github.com/apache/parquet-format
Apache Arrow - arrow.apache.org/
Pandas commands for exporting DataFrames - pandas.pydata.org/docs/refere...
Parquet CLI - pypi.org/project/parquet-cli/
NYC Taxis Dataset - www1.nyc.gov/site/tlc/about/t...
GitHub Gist with code - gist.github.com/mneedham/1118...
Thanks Mark. You're not only explaining the usefulness of using Parquet in about 5m, but giving us extra tips on using some very useful tools as well.
Mark, I have to say, you are one of the few youtube people I don't feel like I need to set 1.5 or 2x speed on to watch. :) Thanks for the videos!
I think that's the best compliment I could ever get ! Thanks :)
Thanks, Mark, for this awesome video and this is very informative
Good video, mark.
Succinct and to the point. Just what you need to go from "dumb" to "dangerous." Thanks!
Very great video, achieved something that numerous cloud providers couldn't achieve with their lengthy documentations within 5 minutes.
Thanks - I'm glad you liked it!
Thank you for edification that doesn't waste time. Well done, Sir!
Wow, succinct, to the point, and above all useful!
Thanks for your kind words 🤠
Excellent tutorial, Mark!!
this was so easy to understand thanks mark
perfectly explained.. didn't need too much verbosity and you did a great job
Thanks!
Definitely useful. Love it.
Wow! big space saving.
awesom😂e
This was like drinking from a fire hose. I liked it but would like to see a much more detailed video where you download the file describe it and then go over the tools needed to make use of the file.
Hi Scott, Thanks for your comments. Have you checked out some of the other videos in the Parquet play list? I've gone into a bit more detail in some of those ones.
Hi Mark, great video. Could you pls cover Parquet Modular encryption topic also.
Hello, I have many .parquet files of the same type and I would like to display these files as a 'select * from ...many_file.parquet', how can I do this with parq please?
I would use DuckDB to do this!
Fast like Ferrari
Every binary format will be more size effective than CSV, YAML, JSON or god forgive me XML. Just because the latter are very bloated when it comes to size.
That's a fair point!
Good video but u need to slow down your speech mate
I found the speed fine, compact enough to not make me slack off. Maybe try pausing when he's showing the code section to read through the logged parts on the terminal?
Why are you so fast? It's overwhelming for the people who doesn't know a specific topic.
Sorry! It's not intentionally fast - that's just the pace that I speak at! Is there any bit that didn't make sense that I can try to explain more?
You always have the option to pause and playback at 0.75x. Remember you're the person learning so it's more important that you find a way to customize to your learning habits.
🫤 this would have been more helpful if you'd had significantly less caffeine.
I don't actually drink caffeine, but I know what you mean! I've been told I speak too quickly since forever, but I find it so difficult to slow down!
Slow the video speed down