Это видео недоступно.

Сожалеем об этом.

Speed Up Data Processing with Apache Parquet in Python

NeuralNine

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 сен 2024

Комментарии • 18

@islam9212 10 месяцев назад ⁺⁸
It hurt my eyes when I saw the calculator even though a python console exists. For a future video it would be interesting to include a comparison with the pickle, feather and jay formats.
@chndrl5649 10 месяцев назад ⁺²
The reason why the memory taken for both dataframe is because of the datatypes. Csv will convert most predefined datatypes into string which is much larger than numeric datatypes
@tb9359 10 месяцев назад ⁺²
Had never heard of Parquet. Thank you. It looks very useful.
@jeremiahhauser7148 9 месяцев назад ⁺²
Interesting, but I am not convinced. If I got it correctly, when selecting columns the time went down by a factor of 3 for both methods (4->1.3s and 0.24->0.08s). So parquet is better anyway, but whether it is specifically better for column-wise access still needs to be demonstrated.
As the other commenter, I would also be interested in a broader comparison with other formats.
Great channel, keep up the good work.
@multitaskprueba1 4 месяца назад
You are a genius! Fantastic video! Thanks!
@dana-pw3us 6 месяцев назад ⁺¹
Why not compare sizes of files on a disk? Are they different?
@JeremyLangdon1 9 месяцев назад
I think pandas tried to infer data types from CSV and often defaults to string. This takes much more space and CPU. Parquet has data types built in to the file so pandas does not need to infer anything. What would be more interesting is when reading the CSV, specify the data types to make it a more “even” comparison.
@Gabriel-cf3bw 9 месяцев назад
Nice tutorial! Very introductory!
@slothner943 10 месяцев назад
Usually go for feather format. Never understood the difference - just that for me and the data im handling (few columns) feather seems to be quicker.
@JLSXMK8 10 месяцев назад ⁺¹
I have a related question: Since parquet files are "column-oriented", do you think they would be a good way to store database backups?
Example scenario: Let's say you want to store a database backup, assuming that the data in the database is in a stable state; it contains a large number of product records; maybe their IDs, descriptions, how many purchases for a product, the product prices, etc. Would it be a good idea to store a backup of this database using a parquet file since the backups would be faster to load in case of the data becoming unstable via a transaction in the future? You could rollback the transactions too; however, what if too many of them fail, and all of them need to be rolled back?
@KingOfAllJackals 10 месяцев назад
Parquet isn’t a generic file format. It IS a table so you’re not “store backups” in a Parquet file. I guess you could backup each table independently but nearly every real DB has much more efficient and powerful native backup infrastructure.
Parquet however is where a lot of transactional data ends up for analytics. Columnar storage is more suited to large analytic workloads. Row stores are more suited for OLTP workloads. You would never want to use Parquet for things like “deduct $7.83 from customer 1234’s checking account”.
@JLSXMK8 10 месяцев назад
@@KingOfAllJackals That is exactly what I thought of possibly using it for; I could use it to back up tables in the database. You did interpret that correctly. I would NOT edit the contents of the parquet backups.
@N0rberK 10 месяцев назад
Tnx Capt.
@julianreichelt1719 9 месяцев назад
nice
@farshidzamanirad9691 10 месяцев назад
Awesome!
@codewithmajid4841 10 месяцев назад
ok Boss
@codewithmajid4841 10 месяцев назад ⁺¹
I am Junior data scientist From Pakistan

Следующие

Автовоспроизведение

Argument Parsing with argparse in Python

Argument Parsing with argparse in Python

The columnar roadmap: Apache Parquet and Apache Arrow

The columnar roadmap: Apache Parquet and Apache Arrow

Garbage Collection in Python: Speed Up Your Code

Garbage Collection in Python: Speed Up Your Code

TRAVIS HUNTER GOES OFF FOR 3 TDS IN SEASON OPENER (NDSU Gameday Vlog)

TRAVIS HUNTER GOES OFF FOR 3 TDS IN SEASON OPENER (NDSU Gameday Vlog)

Indiana Fever Postgame Media Availability (vs. Los Angeles Sparks) | September 4, 2024

Indiana Fever Postgame Media Availability (vs. Los Angeles Sparks) | September 4, 2024

U.S. seizes Venezuelan leader Nicolás Maduro's plane; South Florida reacts

U.S. seizes Venezuelan leader Nicolás Maduro's plane; South Florida reacts

Wicked - Official Trailer 2

Wicked - Official Trailer 2

This INCREDIBLE trick will speed up your data processes.

This INCREDIBLE trick will speed up your data processes.

Is Rust the New King of Data Science?

Is Rust the New King of Data Science?

Image Annotation with LLava & Ollama

Image Annotation with LLava & Ollama

Automatically Schedule Python Scripts with Cron Jobs

Automatically Schedule Python Scripts with Cron Jobs

Test-Driven Development in Python: Test First Code Later

Test-Driven Development in Python: Test First Code Later

Structural Pattern Matching in Python: Not Your Average Switch-Case

Structural Pattern Matching in Python: Not Your Average Switch-Case

What is Apache Parquet file?

What is Apache Parquet file?

Polars: The Next Big Python Data Science Library... written in RUST?

Polars: The Next Big Python Data Science Library... written in RUST?

Selenium Headless Scraping For Servers & Docker

Selenium Headless Scraping For Servers & Docker

How to get Spongebob El Primo FOR FREE!

How to get Spongebob El Primo FOR FREE!

ОДИН ИЗ СИЛЬНЕЙШИХ СЕКРЕТНЫХ ЗОМБИ В PVZ!

ОДИН ИЗ СИЛЬНЕЙШИХ СЕКРЕТНЫХ ЗОМБИ В PVZ!

Кровожадный зверь или ласковая доча?🥹Я считаю не оправдыванно о ней так… а что думаете вы?Нравится?

Кровожадный зверь или ласковая доча?🥹Я считаю не оправдыванно о ней так… а что думаете вы?Нравится?

Я уговариваю своего друга выпить Лава Лава

Я уговариваю своего друга выпить Лава Лава

🤣 Проблемы миллионеров: выхлоп Ламборгини оказался слишком жарким! | Новостничок

🤣 Проблемы миллионеров: выхлоп Ламборгини оказался слишком жарким! | Новостничок

Блатная песня Эдуарда Сурового 😳 #ComedyClub #КамедиКлаб #эдуардсуровый #ГарикХарламов #тнт4 #шансон

Блатная песня Эдуарда Сурового 😳 #ComedyClub #КамедиКлаб #эдуардсуровый #ГарикХарламов #тнт4 #шансон

история про хомяка. см в тг «хей! это марьяна!» #тикток #тренд #хомяк

история про хомяка. см в тг «хей! это марьяна!» #тикток #тренд #хомяк