What is Apache Parquet file?

Поделиться
HTML-код
  • Опубликовано: 28 дек 2024

Комментарии • 91

  • @Adam-go5wv
    @Adam-go5wv 3 месяца назад +1

    I finally understand what is the parquet file format thanks to your video, great job!

  • @patrickbateman7665
    @patrickbateman7665 3 года назад +11

    No one has explained on RUclips better than you Riz.
    Thank you for making such a great video.

    • @RizAngD
      @RizAngD  3 года назад +3

      Wow, thanks

  • @ashishdukare1313
    @ashishdukare1313 2 года назад +2

    Thanks from India. Love the way you explain. Very simple and concise information

  • @cidrisonly
    @cidrisonly 2 года назад +9

    Thanks for the amazing explanation on parquet file system. Coming from a wood business, parquet as a flooring is not new to me. I have done many projects on parquet installation. Interesting to see it coming back in Big Data and Data Engineering.

  • @farzadshams3260
    @farzadshams3260 6 месяцев назад +1

    Thank you Riz. Very helpful video to get a high level understanding of the Parquet files!

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      Glad to hear that!

  • @MarkF-ix5mo
    @MarkF-ix5mo 8 месяцев назад +1

    Great video. Loved the fact that you used Physical Graffiti - one of my fave albums of all time.

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      thanks!!

  • @dhavaldalasaniya
    @dhavaldalasaniya 2 года назад +1

    Really greatly explained & really nice.. keep going Riz !!!

    • @RizAngD
      @RizAngD  2 года назад +1

      Thanks, will do!

  • @roadtrippingwithmihir
    @roadtrippingwithmihir 6 месяцев назад +1

    Excellent and crisp explanation

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      Glad you liked it

  • @owo4202
    @owo4202 4 месяца назад

    Thanks for the clear explanation! It helps a lot!

  • @IamDanish99
    @IamDanish99 2 года назад +1

    Thank you Riz for the wonderful explanation!

    • @RizAngD
      @RizAngD  2 года назад +1

      My pleasure!

  • @cusematt23
    @cusematt23 4 месяца назад

    thanks for the explanation. very nicely done.

  • @nicknick-71
    @nicknick-71 2 года назад +1

    Thanks mate. A very good and quick explanation. Really good work.

    • @RizAngD
      @RizAngD  2 года назад +2

      Glad you liked it!

  • @royteicher
    @royteicher 3 года назад +1

    Love this video! Less than 10 minutes and in depth about the topic. Thanks you!

    • @RizAngD
      @RizAngD  3 года назад +1

      Glad it was helpful!

  • @LuisRomaUSA
    @LuisRomaUSA 3 года назад +3

    I can def see your channel explode in a few months. Good quality content of difficult topics, often covered in other videos that last 1 hr, with poor sound quality and no logic flow. You are going places my dude.

    • @RizAngD
      @RizAngD  3 года назад +3

      that's very kind words Luis!
      I'm still learning to be a better RUclipsr myself :)

  • @lcsxwtian
    @lcsxwtian 3 года назад +2

    You make some excellent content my man!

    • @RizAngD
      @RizAngD  3 года назад +2

      Glad you think so!

  • @AmitDileepKulkarni
    @AmitDileepKulkarni 2 года назад +2

    Lovely explanation Riz and thank you for the video ! I would recommend your channel to all my colleagues who do database related jobs !

    • @RizAngD
      @RizAngD  2 года назад +1

      Thanks for sharing!

  • @higiniofuentes2551
    @higiniofuentes2551 6 месяцев назад +1

    Thank you for this very useful video!

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      Glad it was helpful!

  • @harryocallaghan6393
    @harryocallaghan6393 6 месяцев назад +1

    Really great explanation! thank you so much

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      Glad you enjoyed it!

  • @multitaskprueba1
    @multitaskprueba1 7 месяцев назад +1

    You are a genius! Fantastic video! Thanks!

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      Glad it helped!

  • @paul1113-zw5pn
    @paul1113-zw5pn 11 месяцев назад +1

    Very well explained Encoding and Compression...So I have a Q: Delta versus Dictionary Encoding, How would one decide which given Dictionary seems so much more efficient? But then I suppose it depends on repitition.

  • @munibabu5566
    @munibabu5566 2 года назад +1

    Thank you.. Very well explained.. Crystal clear :)

    • @RizAngD
      @RizAngD  2 года назад

      Glad it was helpful!

  • @Village_Crystal_Stone
    @Village_Crystal_Stone 2 года назад +1

    How to retrieve latest file in to the destination folder ..... Can u please explain...!?

  • @kuljotbakshi967
    @kuljotbakshi967 4 месяца назад

    Great explanation!!!!

  • @Van_Verder
    @Van_Verder 2 года назад +1

    Very helpful, thanks!🙏🏽

  • @ramsvault
    @ramsvault 2 месяца назад

    thank you. wonderful explanation

  • @ecmiguel
    @ecmiguel 7 месяцев назад +1

    Great!!!. Saludos desde Perú

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      thanks!

  • @leoxiaoyanqu
    @leoxiaoyanqu 2 года назад +1

    Thanks for the video Riz! I was curious what's the practical use-case for LZO, 00:05:14, cuz I see when comparing with Snappy, assuming we're dealing with hot data, the only advantage of LZO would be faster decompression. Anything I'm missing? Thanks in advance

    • @RizAngD
      @RizAngD  2 года назад +2

      that's also my understanding :)

  • @kalkanserdar
    @kalkanserdar Год назад +3

    Nice summary. Although, it would help to explain why querying parquet files is more efficient compared to csv, especially for select * queries (where row store format is usually much more efficient). Is it because the type definition and metadata features of parquet? Thanks

  • @masblogger
    @masblogger 3 года назад +2

    I really like this video, very useful.. Can't wait next video.. ;)

    • @RizAngD
      @RizAngD  3 года назад +2

      Thank you! 😃

  • @SriRam-yq4id
    @SriRam-yq4id 3 года назад +2

    Thanks for the Parquet video Riz.
    What is the difference between Parquet and Avro?

    • @RizAngD
      @RizAngD  3 года назад +2

      There are few main differences Sri Ram, notably Parquet is column based files while Avro is row based (like excel), so Parquet is better if you're using querying the data column by column (e.g. analytics), whereas Avro would be better (compared to parquet) if u want to query the scan/query the whole data.
      Plus, Avro is also written in JSON (more human readable) while Parquet comes with its own format and not as readable.

  • @srinivasa1187
    @srinivasa1187 2 года назад +2

    Hi Riz,
    Thanks for this info, One of the best explanations i have seen.
    one doubt
    1) When you give a table to Parquet, does it
    - first partition by rows --> than each partition is converted to columnar and stored inside the parquet.
    OR
    - Does it directly store the data into columnar and into parquet
    And could you please explain ORC and difference between Parquet, ORC, AVRO and when to use what.

    • @RizAngD
      @RizAngD  2 года назад +1

      That's a good Q and I don't know the answer, please do let me know when you do!
      Currently I'm really full with "Life" at the moment, but yeah already plan to create videos about ORC and Avro. Stay tune!

  • @sreelakshmia6762
    @sreelakshmia6762 2 года назад +1

    Subtitles are covering the content. Please enable option to switch off captions

    • @RizAngD
      @RizAngD  2 года назад +1

      thanks for the feedback!

  • @dylanalbertazzi
    @dylanalbertazzi 3 года назад

    Wonderful overview, thank you!

    • @RizAngD
      @RizAngD  3 года назад

      Glad it was helpful!

  • @elarboldeundj4383
    @elarboldeundj4383 2 года назад +2

    gran video, me aclaro todo

  • @devarapallivamsi7064
    @devarapallivamsi7064 9 месяцев назад +1

    Good and to the point.

    • @RizAngD
      @RizAngD  5 месяцев назад +1

      thanks!

  • @ravitalaviya1576
    @ravitalaviya1576 Год назад +1

    I am currently capturing live data in csv format. But for storage benefit, i want to live data is saved in direct parquet format. that is possible or not?

  • @ConaillSoraghan
    @ConaillSoraghan 3 года назад +6

    Very useful overview Riz. As a total noob to this format, I have a simple question: how do you convert data into the parquet format? Is that possible?

    • @RizAngD
      @RizAngD  3 года назад +5

      Thanks Conaill! You can convert data into parquet with many tools in the market these days, some notable examples in Azure worls is Spark (via Databricks or Synapse) and Data Factory (as part of the integration).

  • @pourmog
    @pourmog 2 года назад +1

    nice overview. thank you.

    • @RizAngD
      @RizAngD  2 года назад +1

      Thanks for watching!

  • @karthikeyanbalasubramaniam598
    @karthikeyanbalasubramaniam598 3 года назад +2

    Riz, The presentation looks good. I use the parquet file thru cognos analytic’s dataset. Does parquet files structure column based by default?

    • @RizAngD
      @RizAngD  3 года назад +2

      Yes it does Karthikeyan

  • @kennylaikl299
    @kennylaikl299 2 года назад +1

    Hi Riz, can you do a video on the use case for AVRO compared to Parquet?

    • @RizAngD
      @RizAngD  2 года назад +1

      Already in my backlog, I've just been too busy procrastinating!! :P

  • @nachetdelcopet
    @nachetdelcopet 2 года назад

    Nice video🎉

  • @subarnashrestha7009
    @subarnashrestha7009 2 года назад

    great video, how do i combine multiple snappy.parquet files to single file and load it to snowflake ??

  • @neelbanerjee7875
    @neelbanerjee7875 2 года назад

    Sir.. thanks for this detailed contents.. I have below query, that i didnt get clarified from anywhere...
    People use to say for Hive use ORC, and for spark use Parquet.. dont understand what is the deep logic behind this.. if ORC is more efficient, why we cant use ORC, insted of parquet?

  • @reddyroopesh7
    @reddyroopesh7 2 года назад +1

    Hi Riz, I am doing development from parquet to delta lake. I’m parquet in-line we have change data capture which only reads the data if it has a change from the previous. How good is it ? Do you recommend using it for our SCDs? Do you see value ?

  • @praveenravi6014
    @praveenravi6014 2 года назад +1

    Hi brother, I have issue in sending the parquet file to snowflake. The problem is the .parquet file is been sent to the snowflake table but the date column is not in the shows 1day minus. i.e if the date is 12-01-2022 then in snowflake it is showing as 11-01-2022. I looking for help. I appreciate your time for reading this. Thanks in advance!

  • @michaelshoemaker5635
    @michaelshoemaker5635 2 года назад +1

    What tools are used to query files (csv, parquet) directly? I've never heard of doing this.

    • @RizAngD
      @RizAngD  2 года назад +2

      Assuming you're using Azure cloud, you can use Polybase to query CSV and parquet file directly (i.e. creating external table) from Azure blob storage (or Data Lake) within Azure SQL Database or Azure Synapse SQL :)

    • @michaelshoemaker5635
      @michaelshoemaker5635 2 года назад

      @@RizAngD Ah, thank you! Never used Azure before. Understood.

    • @ularkadutdotnet
      @ularkadutdotnet Год назад +1

      DuckDB

    • @michaelshoemaker5635
      @michaelshoemaker5635 Год назад

      @@ularkadutdotnet Thank You!

  • @nonstopPKFR
    @nonstopPKFR 2 года назад +1

    Hi! I would like to start a personal project of creating a data warehouse in Azure Synapse Analytics. Do you have any suggestions of how I can do so without having to pay hundreds of dollars a month minimum for provisioning a dedicated SQL pool in azure for my project (as per pricing I've seen) Thanks so much! I hope I simply misunderstood Azures DW pricing.

  • @jeevan999able
    @jeevan999able 3 года назад +1

    hello so i need a data federation tool which has a python client, I need to be able to connect and query data from wide variety of data storage platforms , as of now I sore data in ADLS and sqlserver on azure , what would you reccomend

    • @RizAngD
      @RizAngD  3 года назад

      Help me explain what you mean by data federation tool?

    • @jeevan999able
      @jeevan999able 3 года назад

      @@RizAngD so a platform which can connect to many data storage places (s3, adls, mysql, mssq etc )so that regardless of where the data is stored , I have a central platform through which I can access all of it

  • @finedinerest
    @finedinerest 3 года назад +2

    Can you please elaborate more on whats repetition levels and definition levels with a simpler example. It would really help. Thanks in advance. ! 😊

    • @RizAngD
      @RizAngD  3 года назад +2

      I suggest referring this blog, very comprehensive explanation :)
      www.waitingforcode.com/apache-parquet/nested-data-representation-parquet/read

  • @reddyroopesh7
    @reddyroopesh7 2 года назад +1

    Thanks boss

  • @arpanmistry3900
    @arpanmistry3900 2 года назад

    hey Riz, i want your help can you please provide me one sample parquet file with LZO compression ---> i am stucked, i tried alot of things in pyspark and pyarrow to convert but unable to create parquet with lzo compression. please provide me 1 sample file if you can help me

  • @CaribouDataScience
    @CaribouDataScience Год назад +1

    It's not butter its Parquet..

  • @kamkhan7509
    @kamkhan7509 3 года назад +2

    not very helpful video without practical .

    • @RizAngD
      @RizAngD  3 года назад +1

      Sorry to hear that. Tx