We realized we hadn’t explained clearly why we didn't add indexes when using the TPC-DS datasets in PostgreSQL. So, here's why: 1. Currently, pg_duckdb does not support indexes, which makes a direct comparison impossible. Addressing this limitation is a high priority for us : github.com/duckdb/pg_duckdb/issues/243 2. While indexes are common in real-world PostgreSQL scenarios, optimizing them for specific analytic queries can be complicated and bring extra overhead. Considering this, we believe there is value in looking at the performance of queries without any indexes. In the meantime, keep quacking and keep coding... with your favorite elephant and your favorite duck!
You can use DuckDB with dbeaver, which is different than connecting to Postgres using dbeaver and using pg_duckdb. In the first one, DuckDB is running in-memory of dbeaver. In the second one it would be running on Postgres. Hope it's clearer!
Yes, you can, but currently, you cannot write to Postgres tables except when using CREATE TABLE. But writing to Parquet or Motherduck, it's is supported by different Postgres connections concurrently. You currently can only use one Postgres role with DuckDB thought.
Hey! Most of managed Postgres services have a set of limited support of extensions. Check first if that's the case. Otherwise, you can install it manually by building it and following the README instructions: github.com/duckdb/pg_duckdb.
CockroachDB is awesome for HA and as a distributed DB. But nope, it's just PG connection and syntax compatible; different from PG under the hood. I would have loved to use extensions including pg_duckdb if I could with CockroachDB :(
@@mehdio I wonder if it’s not because the first time you ran it the data was placed in the OS cache. Did you do several runs ? Going from a minute to less than a second is suspicious…
@@Cbon-xh3ryone of the extension authors here. I ran this same benchmark for the release and the difference is definitely not related to any caching, I ran the query multiple times. Postgres is consistently much slower on this query (as well as a some others in tpc-ds). From what I could tell from the postgres query plan, is that postgres materializes a pretty big intermediate result.
Isn’t it because the duckdb stuff is happening in its own process, so there’s a bit of overhead from the IPC, but then there’s “zero” contention when the ddb query is happening?
We realized we hadn’t explained clearly why we didn't add indexes when using the TPC-DS datasets in PostgreSQL. So, here's why:
1. Currently, pg_duckdb does not support indexes, which makes a direct comparison impossible. Addressing this limitation is a high priority for us : github.com/duckdb/pg_duckdb/issues/243
2. While indexes are common in real-world PostgreSQL scenarios, optimizing them for specific analytic queries can be complicated and bring extra overhead. Considering this, we believe there is value in looking at the performance of queries without any indexes.
In the meantime, keep quacking and keep coding... with your favorite elephant and your favorite duck!
Wow I’m impressed and will look at trying duck db
Really cool extension, love this idea 💡
What extension and/or command are you using to send the piece of code selected in the editor into terminal?
It's a simple shortcut in VSCode (it would work in most editors also) :
{
"key": "cmd+k",
"command": "workbench.action.terminal.runSelectedText"
},
after pg_duckb is set to enable, does dbeaver query engine use pg_duckdb too?
You can use DuckDB with dbeaver, which is different than connecting to Postgres using dbeaver and using pg_duckdb. In the first one, DuckDB is running in-memory of dbeaver. In the second one it would be running on Postgres. Hope it's clearer!
Talk about concurrency/multiuser. Can I write using pg_duckdb to more than one table at once?
Yes, you can, but currently, you cannot write to Postgres tables except when using CREATE TABLE. But writing to Parquet or Motherduck, it's is supported by different Postgres connections concurrently. You currently can only use one Postgres role with DuckDB thought.
how to install this pgduckdb instance for my azure hosted postgres?
Hey! Most of managed Postgres services have a set of limited support of extensions. Check first if that's the case. Otherwise, you can install it manually by building it and following the README instructions: github.com/duckdb/pg_duckdb.
CockroachDB has a really good perf, pg under the hood
CockroachDB is awesome for HA and as a distributed DB. But nope, it's just PG connection and syntax compatible; different from PG under the hood. I would have loved to use extensions including pg_duckdb if I could with CockroachDB :(
@@arfathahmed3433 thank you bro
Why is it faster if the format is the same ? it’s surprising…
This is exactly what's making me scratch my head...
The planner in DuckDB is simply better optimized for this type of query.
@@mehdio I wonder if it’s not because the first time you ran it the data was placed in the OS cache. Did you do several runs ? Going from a minute to less than a second is suspicious…
@@Cbon-xh3ryone of the extension authors here. I ran this same benchmark for the release and the difference is definitely not related to any caching, I ran the query multiple times. Postgres is consistently much slower on this query (as well as a some others in tpc-ds). From what I could tell from the postgres query plan, is that postgres materializes a pretty big intermediate result.
Isn’t it because the duckdb stuff is happening in its own process, so there’s a bit of overhead from the IPC, but then there’s “zero” contention when the ddb query is happening?