DSDSD - Dutch Seminar on Data Systems Design

Видео 34
Просмотров 18 744

Lambda functions in the duck's nest - Tania Bogatsch

C3: Compressing Correlated Columns - Thomas Glas

Towards LLM-augmented Database Systems - Carsten Binnig

ALP: Adaptive Lossless floating-Point Compression - Leonardo Kuffó (CWI)

Cardinality Estimation Graphs by Semih Salihoğlu - University of Waterloo

LingoDB: Open compilation and optimization framework for sustainable data processing - M. Jungmair

Efficient CSV Parsing - On the Complexity of Simple Things - Pedro Holanda

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN:
We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invite all researchers, especially PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on from researchers in your field.
Website: dsdsd.da.cwi.nl/
X: x.com/dsdsdnl
Speaker: Pedro Holanda
Title: Efficient CSV Parsing: On the Complexit...

Просмотров: 1 248

Видео

Lambda functions in the duck's nest - Tania Bogatsch

Lambda functions in the duck's nest - Tania Bogatsch

Lambda functions in the duck's nest - Tania Bogatsch

Просмотров 1995 месяцев назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invi...

C3: Compressing Correlated Columns - Thomas Glas

C3: Compressing Correlated Columns - Thomas Glas

C3: Compressing Correlated Columns - Thomas Glas

Просмотров 1575 месяцев назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invi...

Towards LLM-augmented Database Systems - Carsten Binnig

Towards LLM-augmented Database Systems - Carsten Binnig

Towards LLM-augmented Database Systems - Carsten Binnig

Просмотров 2736 месяцев назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invi...

ALP: Adaptive Lossless floating-Point Compression - Leonardo Kuffó (CWI)

ALP: Adaptive Lossless floating-Point Compression - Leonardo Kuffó (CWI)

ALP: Adaptive Lossless floating-Point Compression - Leonardo Kuffó (CWI)

Просмотров 47411 месяцев назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Cardinality Estimation Graphs by Semih Salihoğlu - University of Waterloo

Cardinality Estimation Graphs by Semih Salihoğlu - University of Waterloo

Cardinality Estimation Graphs by Semih Salihoğlu - University of Waterloo

Просмотров 239Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

LingoDB: Open compilation and optimization framework for sustainable data processing - M. Jungmair

LingoDB: Open compilation and optimization framework for sustainable data processing - M. Jungmair

LingoDB: Open compilation and optimization framework for sustainable data processing - M. Jungmair

Просмотров 276Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Decoupling Compute and Storage for Stream Processing Systems by Yingjun Wu - CEO RisingWave Labs

Decoupling Compute and Storage for Stream Processing Systems by Yingjun Wu - CEO RisingWave Labs

Decoupling Compute and Storage for Stream Processing Systems by Yingjun Wu - CEO RisingWave Labs

Просмотров 701Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust by Andrew Lamb

Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust by Andrew Lamb

Implementing InfluxDB IOx, "from scratch" using Apache Arrow, DataFusion, and Rust by Andrew Lamb

Просмотров 3,5 тыс.Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Shredding deeply nested JSON, one vector at a time by Laurens Kuiper - DuckDB Labs

Shredding deeply nested JSON, one vector at a time by Laurens Kuiper - DuckDB Labs

Shredding deeply nested JSON, one vector at a time by Laurens Kuiper - DuckDB Labs

Просмотров 985Год назад

[{ "description": "DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international spea...

Provenance Research in Gray Systems Lab at Microsoft by Fotis Psallidas

Provenance Research in Gray Systems Lab at Microsoft by Fotis Psallidas

Provenance Research in Gray Systems Lab at Microsoft by Fotis Psallidas

Просмотров 132Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Database Schemas in the Wild: What Can We Learn from a Large Corpus of Relational Database Schemas?

Database Schemas in the Wild: What Can We Learn from a Large Corpus of Relational Database Schemas?

Database Schemas in the Wild: What Can We Learn from a Large Corpus of Relational Database Schemas?

Просмотров 177Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Efficient detection of multivariate correlations in static and streaming data by Jens d'Hondt

Efficient detection of multivariate correlations in static and streaming data by Jens d'Hondt

Efficient detection of multivariate correlations in static and streaming data by Jens d'Hondt

Просмотров 77Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Stardog query optimiser: Join ordering and cardinality estimations for graph queries by Pavel Klinov

Stardog query optimiser: Join ordering and cardinality estimations for graph queries by Pavel Klinov

Stardog query optimiser: Join ordering and cardinality estimations for graph queries by Pavel Klinov

Просмотров 150Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning by David Vos

Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning by David Vos

Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning by David Vos

Просмотров 252Год назад

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN: We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to come together, foster collaborations between its members, and bring in high-quality international speakers. We would like...

Leveraging Generative AI for Data Processing by Immanuel Trummer [DSDSD 2023]

Leveraging Generative AI for Data Processing by Immanuel Trummer [DSDSD 2023]

Leveraging Generative AI for Data Processing by Immanuel Trummer [DSDSD 2023]

Просмотров 285Год назад

Leveraging Generative AI for Data Processing by Immanuel Trummer [DSDSD 2023]

Data Science through the Looking Glass and what we found there by Bojan Karlaš

Data Science through the Looking Glass and what we found there by Bojan Karlaš

Data Science through the Looking Glass and what we found there by Bojan Karlaš

Просмотров 1572 года назад

Data Science through the Looking Glass and what we found there by Bojan Karlaš

Data Management for Emerging Problems in Large Networks by Arijit Khan

Data Management for Emerging Problems in Large Networks by Arijit Khan

Data Management for Emerging Problems in Large Networks by Arijit Khan

Просмотров 5302 года назад

Data Management for Emerging Problems in Large Networks by Arijit Khan

Building machine learning systems for the era of data-centric AI by Ce Zhang

Building machine learning systems for the era of data-centric AI by Ce Zhang

Building machine learning systems for the era of data-centric AI by Ce Zhang

Просмотров 1442 года назад

Building machine learning systems for the era of data-centric AI by Ce Zhang

mlinspect - Lightweight Inspection of Native Machine Learning Pipelines by Stefan Grafberger

mlinspect - Lightweight Inspection of Native Machine Learning Pipelines by Stefan Grafberger

mlinspect - Lightweight Inspection of Native Machine Learning Pipelines by Stefan Grafberger

Просмотров 1232 года назад

mlinspect - Lightweight Inspection of Native Machine Learning Pipelines by Stefan Grafberger

Algorithms for Relational Knowledge Graphs by Martin Bravenboer

Algorithms for Relational Knowledge Graphs by Martin Bravenboer

Algorithms for Relational Knowledge Graphs by Martin Bravenboer

Просмотров 5232 года назад

Algorithms for Relational Knowledge Graphs by Martin Bravenboer

The LDBC Social Network Benchmark: Business Intelligence workload by Gábor Szárnyas

The LDBC Social Network Benchmark: Business Intelligence workload by Gábor Szárnyas

The LDBC Social Network Benchmark: Business Intelligence workload by Gábor Szárnyas

Просмотров 1482 года назад

The LDBC Social Network Benchmark: Business Intelligence workload by Gábor Szárnyas

Taking a Peek under the Hood of Snowflake's Metadata Management by Max Heimel

Taking a Peek under the Hood of Snowflake's Metadata Management by Max Heimel

Taking a Peek under the Hood of Snowflake's Metadata Management by Max Heimel

Просмотров 7902 года назад

Taking a Peek under the Hood of Snowflake's Metadata Management by Max Heimel

Glidesort: Efficient In-Memory Adaptive Stable Sorting on Modern Hardware by Orson Peters

Glidesort: Efficient In-Memory Adaptive Stable Sorting on Modern Hardware by Orson Peters

Glidesort: Efficient In-Memory Adaptive Stable Sorting on Modern Hardware by Orson Peters

Просмотров 1,9 тыс.2 года назад

Glidesort: Efficient In-Memory Adaptive Stable Sorting on Modern Hardware by Orson Peters

Learned DBMS Components 2.0: From Workload-Driven to Zero-Shot Learning By Carsten Binnig

Learned DBMS Components 2.0: From Workload-Driven to Zero-Shot Learning By Carsten Binnig

Learned DBMS Components 2.0: From Workload-Driven to Zero-Shot Learning By Carsten Binnig

Просмотров 1172 года назад

Learned DBMS Components 2.0: From Workload-Driven to Zero-Shot Learning By Carsten Binnig

Parallel Grouped Aggregation in DuckDB By Hannes Mühleisen

Parallel Grouped Aggregation in DuckDB By Hannes Mühleisen

Parallel Grouped Aggregation in DuckDB By Hannes Mühleisen

Просмотров 8912 года назад

Parallel Grouped Aggregation in DuckDB By Hannes Mühleisen

Efficient collaborative analytics with no information leakage:An idea whose time has come | Vasiliki

Efficient collaborative analytics with no information leakage:An idea whose time has come | Vasiliki

Efficient collaborative analytics with no information leakage:An idea whose time has come | Vasiliki

Просмотров 1132 года назад

Efficient collaborative analytics with no information leakage:An idea whose time has come | Vasiliki

Opening the Black Box of Internal Stream Processor State By Jim Verheijde

Opening the Black Box of Internal Stream Processor State By Jim Verheijde

Opening the Black Box of Internal Stream Processor State By Jim Verheijde

Просмотров 502 года назад

Opening the Black Box of Internal Stream Processor State By Jim Verheijde

Push-Based Execution in DuckDB by Mark Raasveldt (CWI)

Push-Based Execution in DuckDB by Mark Raasveldt (CWI)

Push-Based Execution in DuckDB by Mark Raasveldt (CWI)

Просмотров 1,4 тыс.2 года назад

Push-Based Execution in DuckDB by Mark Raasveldt (CWI)

Building Advanced SQL Analytics From Low-Level Plan Operator By Thomas Neumann (TU Munich)

Building Advanced SQL Analytics From Low-Level Plan Operator By Thomas Neumann (TU Munich)

Building Advanced SQL Analytics From Low-Level Plan Operator By Thomas Neumann (TU Munich)

Просмотров 1,1 тыс.2 года назад

Building Advanced SQL Analytics From Low-Level Plan Operator By Thomas Neumann (TU Munich)

Комментарии

@user-qi2ls1ul2e 18 дней назад
The key concept for me was "the closer a double is to 0, the more exact its representation"
@zhengyuzhang7992 3 месяца назад
very nice and detailed lecture
@madacol 5 месяцев назад
It was confusing to understand what happened in 12:47 when the indexes [0,2,3,3,4] appeared. So this is what's going on, the filter removes 2 entries, and only remained indexes 0,2,3 , and those are exactly the first 3 elements in the vector [ *0,2,3* , 3,4], so the last 2 elements 3,4 are ignored. And this is specified by the first vector [ [0,2] , [2,1] ] that tells the first 2 elements is the first row, and the third element is the second row. (there's no mention on what to do with the rest of indexes, so they are ignored)
@manickbadsah 6 месяцев назад
Thanks for the amazing presentation.
@timpz 8 месяцев назад
Great presentation and interesting idea! It would be interesting to see what the compression ratio would be for 32 bit floats, bit packing the integers would be SIMD-friendly but I suspect probably reduce the ratio by a significant amount for "normal" values.
@LeonardoKuffo 5 месяцев назад
With 32bit floats compression ratios would be halved since the same numbers will still be packed in the same amount of bits after FOR+BP. But the algorithm would remain the same. It would be interesting to see then how ALP keeps up with these other algorithms (chimp, zstd, etc). We have already implemented ALP in DuckDB for floats also. So you may run some quick tests there using the duckdb Python api. What we saw is that 32bit floats are more common in an ML context (e.g. model weights), in which case ALPrd would be used given the randomness of these numbers. Here we can still save a few bits more than, for example, Zstd if the weights want to be stored losslessly!
@stevierusso945 Год назад
👇 'Promo sm'
@-h2780 Год назад
정말 미친 놈들이 많구나
@nahblue Год назад
👏
@user-xi2by7vu1v Год назад
good
@howardzhang6655 2 года назад
thanks
@sufalpal5041 2 года назад
Data really powers everything that we do
@sweetmelodies9678 2 года назад
“The world is now awash in data and we can see consumers in a lot clearer ways.”....👍👍👍