Exploring Parquet Metadata with ClickHouse

Поделиться
HTML-код
  • Опубликовано: 23 сен 2024
  • In this video, we'll learn how to query the metadata of Parquet files using ClickHouse. The video demonstrates how to access and manipulate a dataset named Diffusion DB from Hugging Face, containing metadata for 14 million images generated by the Stable Diffusion AI tool. We'll look at various metadata details like row groups, column data, compressed and uncompressed sizes, and much more. We'll also look at how to use 'array join' and 'untuple' to manipulate and interpret the data more effectively.
    #Hashtags:#DataAnalysis #Clickhouse #Parquet #HuggingFace #StableDiffusion #BigData
    Resources
    DiffusionDB dataset - huggingface.co...
    Deep Dive into ClickHouse Part 1 - clickhouse.com...
    Deep Dive into ClickHouse Part 2- clickhouse.com...

Комментарии •