DSDSD - Dutch Seminar on Data Systems Design
DSDSD - Dutch Seminar on Data Systems Design
  • Видео 34
  • Просмотров 18 744
Efficient CSV Parsing - On the Complexity of Simple Things - Pedro Holanda
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN:
We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invite all researchers, especially PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on from researchers in your field.
Website: dsdsd.da.cwi.nl/
X: x.com/dsdsdnl
Speaker: Pedro Holanda
Title: Efficient CSV Parsing: On the Complexit...
Просмотров: 1 248

Видео

Lambda functions in the duck's nest - Tania Bogatsch
Просмотров 1995 месяцев назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invi...
C3: Compressing Correlated Columns - Thomas Glas
Просмотров 1575 месяцев назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invi...
Towards LLM-augmented Database Systems - Carsten Binnig
Просмотров 2736 месяцев назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invi...
ALP: Adaptive Lossless floating-Point Compression - Leonardo Kuffó (CWI)
Просмотров 47411 месяцев назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Cardinality Estimation Graphs by Semih Salihoğlu - University of Waterloo
Просмотров 239Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
LingoDB: Open compilation and optimization framework for sustainable data processing - M. Jungmair
Просмотров 276Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Decoupling Compute and Storage for Stream Processing Systems by Yingjun Wu - CEO RisingWave Labs
Просмотров 701Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust by Andrew Lamb
Просмотров 3,5 тыс.Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Shredding deeply nested JSON, one vector at a time by Laurens Kuiper - DuckDB Labs
Просмотров 985Год назад
[{ "description": "DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international spea...
Provenance Research in Gray Systems Lab at Microsoft by Fotis Psallidas
Просмотров 132Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Database Schemas in the Wild: What Can We Learn from a Large Corpus of Relational Database Schemas?
Просмотров 177Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Efficient detection of multivariate correlations in static and streaming data by Jens d'Hondt
Просмотров 77Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Stardog query optimiser: Join ordering and cardinality estimations for graph queries by Pavel Klinov
Просмотров 150Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning by David Vos
Просмотров 252Год назад
DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...
Leveraging Generative AI for Data Processing by Immanuel Trummer [DSDSD 2023]
Просмотров 285Год назад
Leveraging Generative AI for Data Processing by Immanuel Trummer [DSDSD 2023]
Data Science through the Looking Glass and what we found there by Bojan Karlaš
Просмотров 1572 года назад
Data Science through the Looking Glass and what we found there by Bojan Karlaš
Data Management for Emerging Problems in Large Networks by Arijit Khan
Просмотров 5302 года назад
Data Management for Emerging Problems in Large Networks by Arijit Khan
Building machine learning systems for the era of data-centric AI by Ce Zhang
Просмотров 1442 года назад
Building machine learning systems for the era of data-centric AI by Ce Zhang
mlinspect - Lightweight Inspection of Native Machine Learning Pipelines by Stefan Grafberger
Просмотров 1232 года назад
mlinspect - Lightweight Inspection of Native Machine Learning Pipelines by Stefan Grafberger
Algorithms for Relational Knowledge Graphs by Martin Bravenboer
Просмотров 5232 года назад
Algorithms for Relational Knowledge Graphs by Martin Bravenboer
The LDBC Social Network Benchmark: Business Intelligence workload by Gábor Szárnyas
Просмотров 1482 года назад
The LDBC Social Network Benchmark: Business Intelligence workload by Gábor Szárnyas
Taking a Peek under the Hood of Snowflake's Metadata Management by Max Heimel
Просмотров 7902 года назад
Taking a Peek under the Hood of Snowflake's Metadata Management by Max Heimel
Glidesort: Efficient In-Memory Adaptive Stable Sorting on Modern Hardware by Orson Peters
Просмотров 1,9 тыс.2 года назад
Glidesort: Efficient In-Memory Adaptive Stable Sorting on Modern Hardware by Orson Peters
Learned DBMS Components 2.0: From Workload-Driven to Zero-Shot Learning By Carsten Binnig
Просмотров 1172 года назад
Learned DBMS Components 2.0: From Workload-Driven to Zero-Shot Learning By Carsten Binnig
Parallel Grouped Aggregation in DuckDB By Hannes Mühleisen
Просмотров 8912 года назад
Parallel Grouped Aggregation in DuckDB By Hannes Mühleisen
Efficient collaborative analytics with no information leakage:An idea whose time has come | Vasiliki
Просмотров 1132 года назад
Efficient collaborative analytics with no information leakage:An idea whose time has come | Vasiliki
Opening the Black Box of Internal Stream Processor State By Jim Verheijde
Просмотров 502 года назад
Opening the Black Box of Internal Stream Processor State By Jim Verheijde
Push-Based Execution in DuckDB by Mark Raasveldt (CWI)
Просмотров 1,4 тыс.2 года назад
Push-Based Execution in DuckDB by Mark Raasveldt (CWI)
Building Advanced SQL Analytics From Low-Level Plan Operator By Thomas Neumann (TU Munich)
Просмотров 1,1 тыс.2 года назад
Building Advanced SQL Analytics From Low-Level Plan Operator By Thomas Neumann (TU Munich)

Комментарии

  • @user-qi2ls1ul2e
    @user-qi2ls1ul2e 18 дней назад

    The key concept for me was "the closer a double is to 0, the more exact its representation"

  • @zhengyuzhang7992
    @zhengyuzhang7992 3 месяца назад

    very nice and detailed lecture

  • @madacol
    @madacol 5 месяцев назад

    It was confusing to understand what happened in 12:47 when the indexes [0,2,3,3,4] appeared. So this is what's going on, the filter removes 2 entries, and only remained indexes 0,2,3 , and those are exactly the first 3 elements in the vector [ *0,2,3* , 3,4], so the last 2 elements 3,4 are ignored. And this is specified by the first vector [ [0,2] , [2,1] ] that tells the first 2 elements is the first row, and the third element is the second row. (there's no mention on what to do with the rest of indexes, so they are ignored)

  • @manickbadsah
    @manickbadsah 6 месяцев назад

    Thanks for the amazing presentation.

  • @timpz
    @timpz 8 месяцев назад

    Great presentation and interesting idea! It would be interesting to see what the compression ratio would be for 32 bit floats, bit packing the integers would be SIMD-friendly but I suspect probably reduce the ratio by a significant amount for "normal" values.

    • @LeonardoKuffo
      @LeonardoKuffo 5 месяцев назад

      With 32bit floats compression ratios would be halved since the same numbers will still be packed in the same amount of bits after FOR+BP. But the algorithm would remain the same. It would be interesting to see then how ALP keeps up with these other algorithms (chimp, zstd, etc). We have already implemented ALP in DuckDB for floats also. So you may run some quick tests there using the duckdb Python api. What we saw is that 32bit floats are more common in an ML context (e.g. model weights), in which case ALPrd would be used given the randomness of these numbers. Here we can still save a few bits more than, for example, Zstd if the weights want to be stored losslessly!

  • @stevierusso945
    @stevierusso945 Год назад

    👇 'Promo sm'

  • @-h2780
    @-h2780 Год назад

    정말 미친 놈들이 많구나

  • @nahblue
    @nahblue Год назад

    👏

  • @user-xi2by7vu1v
    @user-xi2by7vu1v Год назад

    good

  • @howardzhang6655
    @howardzhang6655 2 года назад

    thanks

  • @sufalpal5041
    @sufalpal5041 2 года назад

    Data really powers everything that we do

  • @sweetmelodies9678
    @sweetmelodies9678 2 года назад

    “The world is now awash in data and we can see consumers in a lot clearer ways.”....👍👍👍