pg_duckdb: Postgres analytics just got faster with DuckDB

Поделиться
HTML-код
  • Опубликовано: 25 ноя 2024

Комментарии • 20

  • @motherduckdb
    @motherduckdb  28 дней назад +1

    We realized we hadn’t explained clearly why we didn't add indexes when using the TPC-DS datasets in PostgreSQL. So, here's why:
    1. Currently, pg_duckdb does not support indexes, which makes a direct comparison impossible. Addressing this limitation is a high priority for us : github.com/duckdb/pg_duckdb/issues/243
    2. While indexes are common in real-world PostgreSQL scenarios, optimizing them for specific analytic queries can be complicated and bring extra overhead. Considering this, we believe there is value in looking at the performance of queries without any indexes.
    In the meantime, keep quacking and keep coding... with your favorite elephant and your favorite duck!

  • @rembautimes8808
    @rembautimes8808 29 дней назад +1

    Wow I’m impressed and will look at trying duck db

  • @Dominik-K
    @Dominik-K Месяц назад +1

    Really cool extension, love this idea 💡

  • @statcc0
    @statcc0 Месяц назад +2

    What extension and/or command are you using to send the piece of code selected in the editor into terminal?

    • @mehdio
      @mehdio Месяц назад +5

      It's a simple shortcut in VSCode (it would work in most editors also) :
      {
      "key": "cmd+k",
      "command": "workbench.action.terminal.runSelectedText"
      },

  • @ujicoba6103
    @ujicoba6103 7 дней назад

    after pg_duckb is set to enable, does dbeaver query engine use pg_duckdb too?

    • @motherduckdb
      @motherduckdb  3 дня назад

      You can use DuckDB with dbeaver, which is different than connecting to Postgres using dbeaver and using pg_duckdb. In the first one, DuckDB is running in-memory of dbeaver. In the second one it would be running on Postgres. Hope it's clearer!

  • @jordanfox470
    @jordanfox470 Месяц назад

    Talk about concurrency/multiuser. Can I write using pg_duckdb to more than one table at once?

    • @mehdio
      @mehdio Месяц назад +1

      Yes, you can, but currently, you cannot write to Postgres tables except when using CREATE TABLE. But writing to Parquet or Motherduck, it's is supported by different Postgres connections concurrently. You currently can only use one Postgres role with DuckDB thought.

  • @datapluseconomics
    @datapluseconomics 28 дней назад

    how to install this pgduckdb instance for my azure hosted postgres?

    • @motherduckdb
      @motherduckdb  25 дней назад

      Hey! Most of managed Postgres services have a set of limited support of extensions. Check first if that's the case. Otherwise, you can install it manually by building it and following the README instructions: github.com/duckdb/pg_duckdb.

  • @keyproco
    @keyproco Месяц назад +1

    CockroachDB has a really good perf, pg under the hood

    • @arfathahmed3433
      @arfathahmed3433 25 дней назад +1

      CockroachDB is awesome for HA and as a distributed DB. But nope, it's just PG connection and syntax compatible; different from PG under the hood. I would have loved to use extensions including pg_duckdb if I could with CockroachDB :(

    • @keyproco
      @keyproco 21 день назад

      @@arfathahmed3433 thank you bro

  • @Cbon-xh3ry
    @Cbon-xh3ry Месяц назад +3

    Why is it faster if the format is the same ? it’s surprising…

    • @doekebuursma7667
      @doekebuursma7667 Месяц назад

      This is exactly what's making me scratch my head...

    • @mehdio
      @mehdio Месяц назад +4

      The planner in DuckDB is simply better optimized for this type of query.

    • @Cbon-xh3ry
      @Cbon-xh3ry Месяц назад

      @@mehdio I wonder if it’s not because the first time you ran it the data was placed in the OS cache. Did you do several runs ? Going from a minute to less than a second is suspicious…

    • @JelteFennema
      @JelteFennema Месяц назад +2

      ​@@Cbon-xh3ryone of the extension authors here. I ran this same benchmark for the release and the difference is definitely not related to any caching, I ran the query multiple times. Postgres is consistently much slower on this query (as well as a some others in tpc-ds). From what I could tell from the postgres query plan, is that postgres materializes a pretty big intermediate result.

    • @JayLooney
      @JayLooney Месяц назад

      Isn’t it because the duckdb stuff is happening in its own process, so there’s a bit of overhead from the IPC, but then there’s “zero” contention when the ddb query is happening?