I think you might have an error here with MongoDB configuration. MongoDB writes all write operations to disk as well for data consistency. Given that we only see spikes, that means you did not provide proper paths to all writable information. MongoBD writes and keeps data in memory, yes. But to preserve consistency it writes to Journal. Journal is by default configured on a different path then actual data of the MongoDB! Please make sure that you put journal on the same external drive to check for write usage! It might be that you speedup mongo by allowing it to use two separate drives for all operations!
The postgres test is not taking advantage of the GIN indexes. Each index in PG supports only specific "operator classes". For GIN's jsonb_ops, it supports "? ?& ?| @>" and does not support "->". So when you are doing filtering, you should use "@>" instead. Also instead of the default jsonb_ops, you can create the index with "jsonb_path_ops" which is smaller and faster, but supports only @>
Interesting but how would you rewrite the query to use the index ? The @> test if a document contains some path/values from what I read. Here the filtering is done on a price lower than a given value , not having a specific value. Maybe the product's price should be externalized in its own column and indexed... For every update you'd need to update the 2 columns. Sorry if the question is stupid I know close to 0 on the document/json features of postgresql.
@@pytorche6206 You cannot with a GIN index :) But one can create a BTREE index for that specific property to use with the "->" operator. The issue with BTREE indexes is that you cannot index the whole JSON blob with them, but have to create a separate index for each property. Also you have to use the exact expression you used during the index creation when filtering, otherwise the DB will not use the index.
@@svetlinzarev3453 You can, with jsonb_path_query, which uses the GIN index. It was not possible in the past, but now we have a lot of new jsonb functions.
In this test databases work in different Durability mode: 1. MongoDB buffers writes in RAM and flushes them to disk each 100ms by default. 2. On the graph Postgresql IOPS are equal to the RPS which makes it clear that Postgresql writes to the disk immediately. To make Postgresql act like Mongodb in postgresql.conf set: "synchronous_commit = off" and "wal_writer_delay = 100ms"
The most awaited battle of history! EDIT: Ok so i was in PostgreSQL camp, i think you managed to convince me to start learning MongoDB as well and this weird syntax that they have
Mongo has different types of write concerns so you can say it's okay just store it in RAM or you can say I want it written to disk or I want to written to certain number of cluster members etc.. This is in case the power goes out you might lose something so you can choose the option when you write to the API. @@GolderiQ
Very nice video, very informative, thank you! I'll be keeping a keen eye on the channel for an update to see the differences when implementing the recommendations listed by the people in the comments. Also, would be nice to see the difference in using SQL (on Postgres) vs JSONB (on Mongodb). I feel that's where most people would like to find the differences to (me included)
So in regards to the large amount of disc writes, as some people have pointed out it most likelly is related to WAL, you could try to turn off fsync to check. Another related thing is how updates are implemented in postgres, it is essentially a delete followed by an insert (not exactly but it is creating a new tuple and marking another one as dead). It would be interesting to know if excluding the update test changed the amount of disc writes or not for any of the tests. I actually had a real life example of a rather small table getting millions uppon millions of upserts each hour where almost all became updates that didn't change any values. It was about 100mb big if you copied the data to a new db or did a full vacuum. But around 15gb otherwise. Writing a where statement where we checked every single one of the 15 columns against the incomming values, and only updating the rows that differed where actually much more performant in the long run (but probably a rather nieche case since the entire table fit into memory when not bloated). But it probably would be worth trying.
Maybe a bit niche, but I would love to see SurrealDB compared to mainstream databases, maybe in-memory comparison with sqlite and server comparison with PostgreSQL
I would like to it happen in some later date. Because if we assume it can only happen once, then I would rather see it after some time than now (I'm betting that it might improve a lot in some later date or that there's a clearer understanding where surrealdb perform the best )
Good video! Postgres prioritizes consistency so it would always write to disk, and mongo prioritizes availability and promises eventual consistency so it writes to memory first... The same experiment with queries across shards &/ replica sets should give interesting results
I think that's where the diffrence would matter. What is shown here is maybe 99% of startups will get, but at one point you will need to scale and it would be interesting to see how sharding/replica sets improve/degrate performance on bots. Also Transactions!
@@LotnyLotnik Postgres has an excellent sharding story with Citus (which is used in Cosmos DB on azure but is also self-hostable, I've briefly tried it with stackgres). Most people should just scale writes vertically and add replicas to scale reads though. Sharding is only actually needed at the PB scale and most people should just ensure that they are not running on something slower than a bare metal $100/mo hetzner server. Postgres also has a better insert performance than mongoDB if you are doing bulk inserts with COPY (performs better than insert already at 2-3 rows per copy statement), which generally is what you will use anyway if the data being written is from a message broker like Kafka or something similar. For bulk operations the overhead of committing ensuring that it flushes to disk is negligible. In Go webservers another common pattern is to use a channel to queue up writes instead of having the handler write straight to the DB, which is better than the plain repository pattern because it gives the repository full freedom to autobatch writes. With that said, in those case you have a strongly typed channel of struct types, and then you can usually just write straight to a normalized table instead of writing jsonb, and then postgres is much faster because normalization often means you only need to write a tenth of the data or less (I've seen order of magnitude size reductions migrating mongo/elastic dbs to normalized postgres).
I would really like to see mariadb VS. MySQL. Mariadb is a drop in replacement for MySQL, developed for some years now independent from MySQL. Would be interesting to see who has gotten the better performance since the split.
Anton, I love your videos because they highlight the trade offs in software architecture. I dont know if it is too much to ask but could you highlight the pros and cons of the databases, the ideal cenarios for each one? That will be very helpful. Thanks!
For the disk usage, I’m not an expert in PostgreSQL configuration but eg for MySQL/innodb there’s innodb_buffer_pool_size which gives you the ability to take advantage of RAM if you have a lot. Like in one of the projects I was working for we had 128/256Gb of ram and we were able to fit entire dataset there, it gave us sub ms blazing fast queries. Another issue might be that your dataset is small and you are reading and updating the same data all the time thus evicting/eliminating the page caching.
For Postgres, you have shared_buffers (recommended baseline is 25% of RAM) which is the internal Postgres cache. Then there is effective_cache_size (recommended baseline is rest of RAM, so 75% here), which hints to Postgres how much memory will be used by the page cache, for files. This is the Linux cache itself, which Postgres will benefit from when reading pages from disk. This last one is only taken into account by the optimizer to parallelize aggregates and such, unless the parallelization would use so much page cache that it would invalidate the performance gains. Postgres also has work_mem which is the memory used by each worker for things like sorting or hashing, it’s a relatively very low value compared to shared_buffers, at least on machines with ample RAM (higher than 16GB). Edit: TimescaleDB recommends setting work_mem to 25% RAM / connections. This a bigger than normal value, since Timescale uses more memory heavy algorithms, but it gives an idea.
Anton, your statement regarding disk write iops "Postgres always performs many more operations compared to many other databases [in MySql]" contradicts your "Postgres vs MySql benchmark" where you've shown that MySql writes ~3x more...
Best practice for high throughput with PostgreSQL is to mount yet another separate physical disk volume for the Write Ahead Log (WAL). The write I/O on the log volume will always be sequential, while the data volume will experience much more random read and write I/O ops as the database engine processes the updates, deletions, and reads associated with the SQL statements. It's not about the "speed" of an SSD, in this case it's about isolating the two dramatically different I/O patterns onto different firmware controllers (of the separate volumes).
"in this case it's about isolating the two dramatically different I/O patterns onto different firmware controllers" why is this benefitial? Please, explain where profit comes from. I'm curious
What i would also like to see at the end is the bill from the AWS :D A cost comparison between these two would be super nice, as it is also a big factor when it comes to choosing the database. Great wideo anyways!
if these databases are that important why would anyone teach MySQL, MongoDb or Postgresdb! i don't get it. is it a legacy software or a big tech company thing or a usecase thing or what?
@@rida_brahim cause in CAP they are CA dbs where as 1 node postgre is CA and mongo is CP and yes it’s use case specific. And also there are many more things to consider and that’s why there are so many of them
I'd love to see this test repeated w/ the Postgres table created UNLOGGED; I bet it would drastically reduce the disk usage. For tables with very hot writes, if you're willing to sacrifice some crash reliability, it can substantially speed up writes.
I believe it would be fairer when inserting/... data into MongoDB to use the writeConcern option so that it behaves more like PostgreSQL. Otherwise, it will trade data durability for performance.
When you actually get to this scale, big mongo for enterprise gets expensive. It's a bit apples and oranges anyway though. If you have highly relational data then use relational database like postgres, and scale it with something like citus. If you are just reading documents that don't have a lot of relationships with each other then use nosql like mongo. The real cost in nosql is in dev time when you need to query it in a new way and then need to denormalise it and write it all the different ways so that you can query it efficiently.
Finally, a match of the juggernauts! Although, I somewhat expected this result as I knew MongoDB works better with JSON documents than SQL databases. The conclusion here is really, there is no one size fits all database - while MongoDB works well with JSON, data with rigidly defined shapes are perfect for SQL databases, and SQLite is well suited for mobile/embedded applications.
Anton, thanks for your work. Performance Benchmarks will be determined by the fileformat you are using. Did you do your test on xfs or zfs (or other COW filesystem) formats?
4:40 this database design NOT okay for eCommerce, please don't design your ecommerce database like this! The concept of database normalisation demonstrated here is of course accurate and important to learn but, in this specific case, your order table should contain the duplicated address and product price information. This is so that you can query historical orders and find accurate information about the price the customer actually paid at the time, and the address that the order was actually delivered to. Using foreign keys exclusively in this example, would mean that your historical order information will be inaccurate if you ever decide to change product prices, or if a customer updates their address.
I guess it is just a small sample database with a basic use case which meant to be easy to understand. As you correctly emhasize, accountability for past transactions is a business requirement at many cases, especially in commerce / banking / bookkeeping etc
Dhanyavad for commenting this I was working on one such project and its my first time coding such a full stack project I was about to make the mistake you mentioned you saved me from hours of frustration
The guarantees here are quite different. Postgres has strong guarantees, which means you can't lose data and transactions are consistent. Not an expert on MongoDB, but I think that a successful write has weak default guarantees, e.g. the write may not be flushed to disk and no consistency guarantees beyond a single collection write. That said, I am sure Postgres can be tweaked here to perform better.
Which value have default_toast_compression parameter? you should try lz4 in PostgreSQL for test with blob types. And storage and related parameters also very important
MySQL also provides support for native JSON data type from version 5.7. It supports optional indexing by auto-extracting values from JSON fields. It will be great if you include MySQL to this test or do a standalone MySQL vs Postgres.
@@AntonPutra Thank you for your response! In fact, people realized that they don't want to maintain another database server in addition to the RDMSs. That's why nowadays everyone interested in storing unstructured JSON data in RDMSs.
I have checked the repo and it seems the database is not tuned. Also, it is better to use the alpine-based image as it already consumed a quite of resources. Also in PostgreSQL the WAL and archiving is still there throughout the test, so from my point of view, we need to tune those configuration first as all two database default settings are not too good.
@@AntonPutra The pgtune actually tune only the resources, specifically the memory like shared_buffers, but it ignored the WAL. Since we aimed to make test reliably from business level, please check and extend the archive_timeout, checkpoint_timeout, disable full_page_write, ... . For synchronous_commit, since we dont use replication, set it to local mode.
@@AntonPutra Also, please can you switch to the alpine image to see if it can perform better as most production workload are based on Alpine OS rather than Debian
It is interesting to see that Postgres always have more writes to disk compared to other DBs, And, how do you setup these test? could you please create a tutorial for that ? it would be helpful Thanks a lot, I really appreciate your work.
the reason that postgres uses more io is that it's actually a DATA base so when it's says it's written to disk, it's. if you disable fsync you boy gonna not find anything better in one node, except maybe sqlite or duckdb but that's another beast
You do not have to disable fsync completely. I would say that doing in PostgreSQL the same "trick" that MongoDB is using here would increase performance by an enormous amount. MongoDB in the default standalone server config (like here) writes the journal to disk every 100ms. If your use case allows some data loss, then PostgreSQL is a beast. However, do not try this for financial applications, pease ...
It'd be interesting to see this compared to embedded databases that use LSM trees, such as RocksDB. RocksDB rust crate alongside serde_json or simd_json versus Postgres would be an interesting comparison.
Wow. I am really surprised. This is RDBMS vs NoSQL. And I thought NoSQL always faster than all rdbms out there. But I see postgresql is really doing great job here. Amazing video. Thanks.
@@AntonPutra Read this paper : Why Parallelize When You Can Distribute ? By Tudor-Ioan Salomie, Ionut Emanuel Subasu, Jana Giceva, Gustavo Alonso at ETH Zurich, Switzerland.
Looks like amounts of data was small as mongo operates on lookups normally. Could you add some more data and retest ? In my experience, if collection has mln docs and became painful to make lookups in mongo.
This confirms (I already knew it) that I DO NOT NEED MongoDB for my applications, even if they use JSON. PostgreSQL provides the flexibility of SQL (which I need), and also BETTER performance. The default MongoDB container stores the WAL (write-ahead log) in memory, writing it to disk periodically, impacting durability and improving performance. As the MongoDB manual describes: "In between write operations, while the journal records remain in the WiredTiger buffers, updates can be lost following a hard shutdown of mongod." By the way, this is probably the reason for PosgreSQL writing more to disk: default PostgreSQL provides better durability, which is what I need in my applications. Even with this "trick", both are similar in performance below 5.000 qps (which is more than what I need in practice). If MongoDB is configured with the same ACID properties (especially the D in ACID), I expect MongoDB to have worst performance even below 5.000 qps. Good job, Anton, and thanks.
I add my summary: for applications which can not tolerate data loss, go with PosgreSQL. When you can afford data loss in some occasions, you can use MongoDB. For me, the situation is clear (it has been for a long time, by the way).
Your durability argument is misleading, this is only true when journaling is set to false on a write operation. The default configuration sets journaling to true when write concern is majority, which is the default write concern. This is like saying "I can turn fsync off in postgresql so it isn't durable!"
@@andyb012345 No, it is your argument the one misleading, because you confuse journaling with where the journal is written and when. For a deployment with a single server with the default configuration (like in this benchmark), journaling is enabled but the default write concern is { w: 1 }. But the journal is flushed to disk every 100 milliseconds (because w:1 does not imply j:true) With 5.000 writes/sec, up to 500 transactions can be lost. When a replica set is used, then I agree with you that by default the write concern is w:majority, which implies j:true and the journal is flushed to disk immediately. But this is not what was compared to PostgreSQL in this benchmark.
@@jesusruiz4073 Nah. writeConcernMajorityJournalDefault is the default. And the default writeConcern is majority. Even for a single node replicaset
If we are talking dbs, there’s a new player out there called “newSQL” like eg several big companies I worked/work for are using TiDB and switching their existing solutions to it. It’d be a bit hard so setup a cluster but I’d do anything to see the performance comparison with both classic relational 1 node db and eg mongo.
I still don't understand if he did do join and what settings are used on Postgres. If he used the default settings, then this is not a good test at all. MongoDB is caching so much more, and Postgres never caches more than few kb's in the default settings. This is dumb... Comparing disk writes to in-memory database...
@ Thank you for the video though. Another suggestion is that I don't know any projects that store json objects in relational databases. It is usually stored in columns and only made into json when you need to send it to a client-side. You shouldn't query the json objects inside a relational database, because it is made to work with rows and columns only.
I wanted to use Neo4j for my pet project, however it is so hardware hungry, I decided to use postgres. But I really want to see comparison of neo4j and postgres.
No sane dba would model the data such that the price field would be within a json blob. The price and all other non-optional fields should have their own columns and be indexed with a normal btree index.
This once again proves that PostgreSQL is the only thing you need for most applications. It can replace document DBs, caches like Redis and even message brokers.
Interesting video. I'm sure postgres/mongo is done something smart here, someone will make a PR to improve postgres i think. Also, KTOR just hit 3.0, so i will really appreciate a speedtest between kotlin and golang with ktor as the backend framework for kotlin and maybe echo as the golang backend framework. Thank you for your awesome work!
Well, it's like apples and oranges. One allows you to store data with a flexible schema that is well suited for prototyping and low-latency selects, while the other enables you to query your data in any way. One is suitable for horizontal scaling, and the other has excellent tools for securing your data. One is proprietary-ish, and the other is framework-ish for other proprietary-ish products. Have a look for Citus if you think that Postgres hardly scales =)
@@Zizaco thats true, but in SQL you can use complex joins, while in mongo it's not so efficient to join multiple collections. As far as I remember, transactions are limited to one collection in mongo, while pg allows you to run transactions across multiple tables. And so on, and so on.
🔴 To support my channel, I'd like to offer Mentorship/On-the-Job Support/Consulting (me@antonputra.com)
I think you might have an error here with MongoDB configuration. MongoDB writes all write operations to disk as well for data consistency. Given that we only see spikes, that means you did not provide proper paths to all writable information.
MongoBD writes and keeps data in memory, yes. But to preserve consistency it writes to Journal. Journal is by default configured on a different path then actual data of the MongoDB! Please make sure that you put journal on the same external drive to check for write usage! It might be that you speedup mongo by allowing it to use two separate drives for all operations!
The postgres test is not taking advantage of the GIN indexes. Each index in PG supports only specific "operator classes". For GIN's jsonb_ops, it supports "? ?& ?| @>" and does not support "->". So when you are doing filtering, you should use "@>" instead. Also instead of the default jsonb_ops, you can create the index with "jsonb_path_ops" which is smaller and faster, but supports only @>
Create a pull request
Interesting but how would you rewrite the query to use the index ?
The @> test if a document contains some path/values from what I read. Here the filtering is done on a price lower than a given value , not having a specific value.
Maybe the product's price should be externalized in its own column and indexed... For every update you'd need to update the 2 columns.
Sorry if the question is stupid I know close to 0 on the document/json features of postgresql.
@@pytorche6206 You cannot with a GIN index :) But one can create a BTREE index for that specific property to use with the "->" operator. The issue with BTREE indexes is that you cannot index the whole JSON blob with them, but have to create a separate index for each property. Also you have to use the exact expression you used during the index creation when filtering, otherwise the DB will not use the index.
@@svetlinzarev3453 You can, with jsonb_path_query, which uses the GIN index. It was not possible in the past, but now we have a lot of new jsonb functions.
No sane dba would model the data such that the price field would be within a json blob.
In this test databases work in different Durability mode:
1. MongoDB buffers writes in RAM and flushes them to disk each 100ms by default.
2. On the graph Postgresql IOPS are equal to the RPS which makes it clear that Postgresql writes to the disk immediately.
To make Postgresql act like Mongodb in postgresql.conf set: "synchronous_commit = off" and "wal_writer_delay = 100ms"
so, these graphics are wrong?
@@lucasrueda3089 these graphs reflect how two databases work in different modes. The configuration is "wrong" here.
Thank your for this one! Great comparison.
This is the video I was looking for after studying about DBMS scaling and ACID, BASE properties.
MongoDB is ACID compliant btw
The most awaited battle of history!
EDIT: Ok so i was in PostgreSQL camp, i think you managed to convince me to start learning MongoDB as well and this weird syntax that they have
if you want json, mongo is a no-brainer
@@AntonPutra not necessarily
Thanks
Let's get this man to the top of RUclips
Thanks!
thank you for support!! ❤️
Postgres is actually writing data to the disk. Mongo is not. That is why you see such higher throughput and lower io/s
mongo is writing 20 GB in disk usage at the end
it's just that maybe it's indexing mechanism does not need to be updated as constantly
To ram?
Where does Mongo write?
It is indexing high likely. I think we eill have another test later with the optimized indexes.
Mongo has different types of write concerns so you can say it's okay just store it in RAM or you can say I want it written to disk or I want to written to certain number of cluster members etc.. This is in case the power goes out you might lose something so you can choose the option when you write to the API. @@GolderiQ
Very nice video, very informative, thank you!
I'll be keeping a keen eye on the channel for an update to see the differences when implementing the recommendations listed by the people in the comments. Also, would be nice to see the difference in using SQL (on Postgres) vs JSONB (on Mongodb). I feel that's where most people would like to find the differences to (me included)
Bro, you came now to databases, wow! :D These comparison tests are so fun!
i'll do more, redis vs dragonfly is next
So in regards to the large amount of disc writes, as some people have pointed out it most likelly is related to WAL, you could try to turn off fsync to check.
Another related thing is how updates are implemented in postgres, it is essentially a delete followed by an insert (not exactly but it is creating a new tuple and marking another one as dead).
It would be interesting to know if excluding the update test changed the amount of disc writes or not for any of the tests.
I actually had a real life example of a rather small table getting millions uppon millions of upserts each hour where almost all became updates that didn't change any values. It was about 100mb big if you copied the data to a new db or did a full vacuum. But around 15gb otherwise.
Writing a where statement where we checked every single one of the 15 columns against the incomming values, and only updating the rows that differed where actually much more performant in the long run (but probably a rather nieche case since the entire table fit into memory when not bloated). But it probably would be worth trying.
Maybe a bit niche, but I would love to see SurrealDB compared to mainstream databases, maybe in-memory comparison with sqlite and server comparison with PostgreSQL
I would like to it happen in some later date. Because if we assume it can only happen once, then I would rather see it after some time than now (I'm betting that it might improve a lot in some later date or that there's a clearer understanding where surrealdb perform the best )
@@antoniong4380 It has been quite some time since SurrealDB released. Heck, they released v2!
@@antoniong4380not once, at least, there are two videos which compare nginx and traefik
I was building a tool for a company. Initially we used SurrealDB, but ended up going back to SQLite because SurrealDB is too slow.
Thanks Anton, unvaluable video
Good video!
Postgres prioritizes consistency so it would always write to disk, and mongo prioritizes availability and promises eventual consistency so it writes to memory first... The same experiment with queries across shards &/ replica sets should give interesting results
I think that's where the diffrence would matter. What is shown here is maybe 99% of startups will get, but at one point you will need to scale and it would be interesting to see how sharding/replica sets improve/degrate performance on bots.
Also Transactions!
This isn't true. MongoDB default configuration will always write the journal to disk on every write operation, the same as PostgreSQL.
@@LotnyLotnik Postgres has an excellent sharding story with Citus (which is used in Cosmos DB on azure but is also self-hostable, I've briefly tried it with stackgres). Most people should just scale writes vertically and add replicas to scale reads though. Sharding is only actually needed at the PB scale and most people should just ensure that they are not running on something slower than a bare metal $100/mo hetzner server.
Postgres also has a better insert performance than mongoDB if you are doing bulk inserts with COPY (performs better than insert already at 2-3 rows per copy statement), which generally is what you will use anyway if the data being written is from a message broker like Kafka or something similar. For bulk operations the overhead of committing ensuring that it flushes to disk is negligible.
In Go webservers another common pattern is to use a channel to queue up writes instead of having the handler write straight to the DB, which is better than the plain repository pattern because it gives the repository full freedom to autobatch writes. With that said, in those case you have a strongly typed channel of struct types, and then you can usually just write straight to a normalized table instead of writing jsonb, and then postgres is much faster because normalization often means you only need to write a tenth of the data or less (I've seen order of magnitude size reductions migrating mongo/elastic dbs to normalized postgres).
Postgres can be configured to do the same thing mongodb is doing here by disabling synchronous commits it seems
Thanks
I would really like to see mariadb VS. MySQL.
Mariadb is a drop in replacement for MySQL, developed for some years now independent from MySQL.
Would be interesting to see who has gotten the better performance since the split.
Anton, I love your videos because they highlight the trade offs in software architecture. I dont know if it is too much to ask but could you highlight the pros and cons of the databases, the ideal cenarios for each one? That will be very helpful.
Thanks!
For the disk usage, I’m not an expert in PostgreSQL configuration but eg for MySQL/innodb there’s innodb_buffer_pool_size which gives you the ability to take advantage of RAM if you have a lot. Like in one of the projects I was working for we had 128/256Gb of ram and we were able to fit entire dataset there, it gave us sub ms blazing fast queries.
Another issue might be that your dataset is small and you are reading and updating the same data all the time thus evicting/eliminating the page caching.
For Postgres, you have shared_buffers (recommended baseline is 25% of RAM) which is the internal Postgres cache. Then there is effective_cache_size (recommended baseline is rest of RAM, so 75% here), which hints to Postgres how much memory will be used by the page cache, for files. This is the Linux cache itself, which Postgres will benefit from when reading pages from disk. This last one is only taken into account by the optimizer to parallelize aggregates and such, unless the parallelization would use so much page cache that it would invalidate the performance gains. Postgres also has work_mem which is the memory used by each worker for things like sorting or hashing, it’s a relatively very low value compared to shared_buffers, at least on machines with ample RAM (higher than 16GB).
Edit: TimescaleDB recommends setting work_mem to 25% RAM / connections. This a bigger than normal value, since Timescale uses more memory heavy algorithms, but it gives an idea.
Anton, your statement regarding disk write iops "Postgres always performs many more operations compared to many other databases [in MySql]" contradicts your "Postgres vs MySql benchmark" where you've shown that MySql writes ~3x more...
Best practice for high throughput with PostgreSQL is to mount yet another separate physical disk volume for the Write Ahead Log (WAL). The write I/O on the log volume will always be sequential, while the data volume will experience much more random read and write I/O ops as the database engine processes the updates, deletions, and reads associated with the SQL statements.
It's not about the "speed" of an SSD, in this case it's about isolating the two dramatically different I/O patterns onto different firmware controllers (of the separate volumes).
"in this case it's about isolating the two dramatically different I/O patterns onto different firmware controllers" why is this benefitial? Please, explain where profit comes from. I'm curious
What i would also like to see at the end is the bill from the AWS :D A cost comparison between these two would be super nice, as it is also a big factor when it comes to choosing the database. Great wideo anyways!
thanks, it's about $15-$20 for the test, including prep, for running it for 2-3 hours
Thank you for this video. It was very insightful 🙂
thank you!
We need scylladb vs casandra
Yes, and CouchDB.
Maybe worth looking at new sql solutions like tidb. I might not understand your use case but Cassandra is a sh!tty thing.
ScyllaDB vs ClickHouse
if these databases are that important why would anyone teach MySQL, MongoDb or Postgresdb! i don't get it. is it a legacy software or a big tech company thing or a usecase thing or what?
@@rida_brahim cause in CAP they are CA dbs where as 1 node postgre is CA and mongo is CP and yes it’s use case specific. And also there are many more things to consider and that’s why there are so many of them
Dhanyavad
SurrealDB is perfectly balancing between mongodb and postgresql, but the only concern right now is it's not fully ready for production
Nice job, thanks!
I think it would have made more sense to compare a b-tree index in postgresql as this is how they are implemented in mongodb.
I'd love to see this test repeated w/ the Postgres table created UNLOGGED; I bet it would drastically reduce the disk usage.
For tables with very hot writes, if you're willing to sacrifice some crash reliability, it can substantially speed up writes.
I loved the dubbing
Could you create a video, how you are creating this infrastructure, test loads, configurations, dashboards etc.
well i have a bunch of those tutorials on my channel including terraform, kubernetes, prometheus etc
I believe it would be fairer when inserting/... data into MongoDB to use the writeConcern option so that it behaves more like PostgreSQL. Otherwise, it will trade data durability for performance.
For disk writes, perhaps Postgres is configured more conservatively with respect to always making sync writes out of the box compared to Mongo?
When you actually get to this scale, big mongo for enterprise gets expensive.
It's a bit apples and oranges anyway though. If you have highly relational data then use relational database like postgres, and scale it with something like citus. If you are just reading documents that don't have a lot of relationships with each other then use nosql like mongo.
The real cost in nosql is in dev time when you need to query it in a new way and then need to denormalise it and write it all the different ways so that you can query it efficiently.
I've left a comment about Citus and decided to look into whether I'm alone here with it. Good evening, sir ))
Finally, a match of the juggernauts! Although, I somewhat expected this result as I knew MongoDB works better with JSON documents than SQL databases. The conclusion here is really, there is no one size fits all database - while MongoDB works well with JSON, data with rigidly defined shapes are perfect for SQL databases, and SQLite is well suited for mobile/embedded applications.
I don't use PG for pure document style databases. I do how ever really like to combine the two types where it makes sense, it's really nice
Nice video, thanks man! Would be nice to see EdgeDB in upcoming videos
It's me again :). If you want to reduce the PostgreSQL writes to disk try to set commit_delay = 300 or higher.
Je kunt een pull request op de repo achterlaten, of een issue. Grotere kans dat het dan gezien wordt.
My first impression was also, at this order of magnitude difference is either pg doing something unnecessary or mongo skipping corners
@@luca4479 ow dat heb ik al genoeg gedaan. maar in dit geval kon ik de juiste code niet vinden. Want hij deelde zijn postgresql config desze keer.
no, i'll keep it as is, but thanks for the tip! i'll take a look
@@AntonPutra no problem.. @
Thanks for providing an audio track in Portuguese!
Very Nice video Love you brother 🥰🥰🥰🥰🥰🥰🥰🥰🥰🥰🥰🥰🥰
thank you!
Anton, thanks for your work. Performance Benchmarks will be determined by the fileformat you are using. Did you do your test on xfs or zfs (or other COW filesystem) formats?
What are the configurations of this PostgreSQL database? Number of connections, WAL, concurrency, etc.?
4:40 this database design NOT okay for eCommerce, please don't design your ecommerce database like this!
The concept of database normalisation demonstrated here is of course accurate and important to learn but, in this specific case, your order table should contain the duplicated address and product price information. This is so that you can query historical orders and find accurate information about the price the customer actually paid at the time, and the address that the order was actually delivered to. Using foreign keys exclusively in this example, would mean that your historical order information will be inaccurate if you ever decide to change product prices, or if a customer updates their address.
I guess it is just a small sample database with a basic use case which meant to be easy to understand.
As you correctly emhasize, accountability for past transactions is a business requirement at many cases, especially in commerce / banking / bookkeeping etc
Having an order date/time and a product history table would solve this. Same approach you'd use for anything that needs to be audited.
Dhanyavad for commenting this I was working on one such project and its my first time coding such a full stack project I was about to make the mistake you mentioned you saved me from hours of frustration
When NATS vs Kafka?
+
+
+
Oh you're looking for trouble! LOL!
The guarantees here are quite different. Postgres has strong guarantees, which means you can't lose data and transactions are consistent.
Not an expert on MongoDB, but I think that a successful write has weak default guarantees, e.g. the write may not be flushed to disk and no consistency guarantees beyond a single collection write.
That said, I am sure Postgres can be tweaked here to perform better.
It was the case 10 years ago. but now mongoDB defaults to strong guarantees too
interesting, i'll take a look
finally it's here.
¡Esta en españo! ¡Increíble! Muchas gracias 😬
Do you have some video teaching making this type of tests?
i have some tutorials covering monitoring
Postgres performs more disk operations because of transactional logs, doesn't it?
Which value have default_toast_compression parameter? you should try lz4 in PostgreSQL for test with blob types.
And storage and related parameters also very important
Very good video, thank you. Can you test scaling up both databases
yes i was thinking about performing some distributed tests, i'll start with redis cluster next with dragonfly
MySQL also provides support for native JSON data type from version 5.7. It supports optional indexing by auto-extracting values from JSON fields. It will be great if you include MySQL to this test or do a standalone MySQL vs Postgres.
thanks, i may do it in the future but i don't think most people would use postgres and mysql for storing json docs
@@AntonPutra Thank you for your response! In fact, people realized that they don't want to maintain another database server in addition to the RDMSs. That's why nowadays everyone interested in storing unstructured JSON data in RDMSs.
Apparently Postgres writes WAL to the disk
It has to. Otherwise you can lose the data. WAL is essential for consistency.
Awesome video
thank you!
I have checked the repo and it seems the database is not tuned. Also, it is better to use the alpine-based image as it already consumed a quite of resources. Also in PostgreSQL the WAL and archiving is still there throughout the test, so from my point of view, we need to tune those configuration first as all two database default settings are not too good.
sorry i forgot to upload postgres config, it is optimized. i use pgtune for that. i don't use docker to run databases, i use systemd
@@AntonPutra The pgtune actually tune only the resources, specifically the memory like shared_buffers, but it ignored the WAL. Since we aimed to make test reliably from business level, please check and extend the archive_timeout, checkpoint_timeout, disable full_page_write, ... . For synchronous_commit, since we dont use replication, set it to local mode.
@@AntonPutra Also, please can you switch to the alpine image to see if it can perform better as most production workload are based on Alpine OS rather than Debian
It is interesting to see that Postgres always have more writes to disk compared to other DBs,
And, how do you setup these test? could you please create a tutorial for that ? it would be helpful
Thanks a lot, I really appreciate your work.
the reason that postgres uses more io is that it's actually a DATA base so when it's says it's written to disk, it's. if you disable fsync you boy gonna not find anything better in one node, except maybe sqlite or duckdb but that's another beast
make sense 😊
You do not have to disable fsync completely. I would say that doing in PostgreSQL the same "trick" that MongoDB is using here would increase performance by an enormous amount. MongoDB in the default standalone server config (like here) writes the journal to disk every 100ms.
If your use case allows some data loss, then PostgreSQL is a beast. However, do not try this for financial applications, pease ...
Please mention higher/lower is better for each metric in all the videos and please provide your conclusion in the end.
It'd be interesting to see this compared to embedded databases that use LSM trees, such as RocksDB.
RocksDB rust crate alongside serde_json or simd_json versus Postgres would be an interesting comparison.
Wow. I am really surprised. This is RDBMS vs NoSQL. And I thought NoSQL always faster than all rdbms out there. But I see postgresql is really doing great job here. Amazing video. Thanks.
Postgres has a shared buffer lock & yield problem !
The problem will get pronounced as you increase the number of threads.
ok interesting i'll take a look
@@AntonPutra Read this paper : Why Parallelize When You Can Distribute ? By Tudor-Ioan Salomie, Ionut Emanuel Subasu, Jana Giceva, Gustavo Alonso at ETH Zurich, Switzerland.
Looks like amounts of data was small as mongo operates on lookups normally. Could you add some more data and retest ? In my experience, if collection has mln docs and became painful to make lookups in mongo.
Would've loved to see a more apples to apples comparison and memory usage.
Hey this is a great video. How you created those infographic btw?
thanks, adobe uite
This confirms (I already knew it) that I DO NOT NEED MongoDB for my applications, even if they use JSON.
PostgreSQL provides the flexibility of SQL (which I need), and also BETTER performance. The default MongoDB container stores the WAL (write-ahead log) in memory, writing it to disk periodically, impacting durability and improving performance. As the MongoDB manual describes: "In between write operations, while the journal records remain in the WiredTiger buffers, updates can be lost following a hard shutdown of mongod."
By the way, this is probably the reason for PosgreSQL writing more to disk: default PostgreSQL provides better durability, which is what I need in my applications.
Even with this "trick", both are similar in performance below 5.000 qps (which is more than what I need in practice).
If MongoDB is configured with the same ACID properties (especially the D in ACID), I expect MongoDB to have worst performance even below 5.000 qps.
Good job, Anton, and thanks.
I add my summary: for applications which can not tolerate data loss, go with PosgreSQL. When you can afford data loss in some occasions, you can use MongoDB.
For me, the situation is clear (it has been for a long time, by the way).
Your durability argument is misleading, this is only true when journaling is set to false on a write operation. The default configuration sets journaling to true when write concern is majority, which is the default write concern.
This is like saying "I can turn fsync off in postgresql so it isn't durable!"
@@jesusruiz4073When you use MongoDB you can specify the write concern so you can totally ensure it's written to disk just fine and safely.
@@andyb012345 No, it is your argument the one misleading, because you confuse journaling with where the journal is written and when.
For a deployment with a single server with the default configuration (like in this benchmark), journaling is enabled but the default write concern is { w: 1 }.
But the journal is flushed to disk every 100 milliseconds (because w:1 does not imply j:true)
With 5.000 writes/sec, up to 500 transactions can be lost.
When a replica set is used, then I agree with you that by default the write concern is w:majority, which implies j:true and the journal is flushed to disk immediately.
But this is not what was compared to PostgreSQL in this benchmark.
@@jesusruiz4073 Nah. writeConcernMajorityJournalDefault is the default. And the default writeConcern is majority.
Even for a single node replicaset
Couchbase vs MongoDB
according to some benchmarks i saw online postrgres outperforms MongoDB in paginated queries
well maybe but i hardcoded a limit of 10 for both, as soon as i get enough feedback i'll refresh it
Video idea: a postgres X postgres with fsync turned off :) that'd be very cool!
ok :)
If we are talking dbs, there’s a new player out there called “newSQL” like eg several big companies I worked/work for are using TiDB and switching their existing solutions to it. It’d be a bit hard so setup a cluster but I’d do anything to see the performance comparison with both classic relational 1 node db and eg mongo.
I still don't understand if he did do join and what settings are used on Postgres. If he used the default settings, then this is not a good test at all. MongoDB is caching so much more, and Postgres never caches more than few kb's in the default settings. This is dumb... Comparing disk writes to in-memory database...
i used pgtune to optimize postgres, just forgot to upload it to github
@ Thank you for the video though. Another suggestion is that I don't know any projects that store json objects in relational databases. It is usually stored in columns and only made into json when you need to send it to a client-side. You shouldn't query the json objects inside a relational database, because it is made to work with rows and columns only.
Helpfull TQ can you make backup incement backup for database in real time mogodb marinadb, in eks
I wanted to use Neo4j for my pet project, however it is so hardware hungry, I decided to use postgres. But I really want to see comparison of neo4j and postgres.
ok i used it in the past, it is an optimized graph database but i think i can use a postgres plugin as well
Postgres vs. Microsoft sql server, please 🙏
ok, it's on my list
No sane dba would model the data such that the price field would be within a json blob.
The price and all other non-optional fields should have their own columns and be indexed with a normal btree index.
Please make another comparision video after setting postgres up such that it acts like mongo db.
Would have been nice to also include SQLite here
i'll do sqlite soon as well using unix socket instead of tcp/ip as i did in the previous test
This once again proves that PostgreSQL is the only thing you need for most applications. It can replace document DBs, caches like Redis and even message brokers.
yes it covers 99% of all you need
Addicting channel, I literally check your channel every day. Plz MariaDB vs MySQL. Отличный канал, эти тесты пушка, такого просто нет нигде
spasibo! ❤️
11:20 Were you able to find the root cause of those disk writes ? Or maybe the absence of this write in MongoDB ?
not yet
When watch backend tutorials everyone using postgresql and drizzle in recent times. Why developers use more often postgresql than mongodb?
Interesting video. I'm sure postgres/mongo is done something smart here, someone will make a PR to improve postgres i think.
Also, KTOR just hit 3.0, so i will really appreciate a speedtest between kotlin and golang with ktor as the backend framework for kotlin and maybe echo as the golang backend framework.
Thank you for your awesome work!
How to make animated display with grafana and prometheus?
Are you running Postgres with the vanilla default config again? 😢
no i forgot to commit postgres config, i use pgtune to optimize it based on the hardware i use
Who would have thought that writing to memory would be faster than to a hard drive? What level is this?
Ohhh spanish ❤ thank you so much !
can you do a cassandra vs mongodb comparison next?
columnar vs document database, maybe will do in the future 😊
any plan to compare cockroach db and yugabyte db ?
I wish you had memory compare too
Can you compare TimescaleDB vs QuestDB??
ok, added!
@@AntonPutra Thanks, coffee on the way!!!
Все еще ждем Postgre vs Maria
Apache Solr vs ElasticSearch
noted!
Can you do MongoDB - MariaDB? please
or even ScyllaDB - MongoDB
yes mariadb will be next
Can you test DuckDB vs Sqlite ?
Mongodb vs elasticsearch? :)
can you compare performance postreSql vs MS Sql Server?
OLS vs Nginx, I've was surprised how OLS make sites flying..
ok interesting
How about tiger beetle. And run some financial transactions test the 1kx improvement
Well, it's like apples and oranges. One allows you to store data with a flexible schema that is well suited for prototyping and low-latency selects, while the other enables you to query your data in any way. One is suitable for horizontal scaling, and the other has excellent tools for securing your data. One is proprietary-ish, and the other is framework-ish for other proprietary-ish products. Have a look for Citus if you think that Postgres hardly scales =)
Last time I checked mondoDB also allowed you to query your data in any way
@@Zizaco thats true, but in SQL you can use complex joins, while in mongo it's not so efficient to join multiple collections.
As far as I remember, transactions are limited to one collection in mongo, while pg allows you to run transactions across multiple tables.
And so on, and so on.
Redis vs scaletable (rust alternative)
Hi, do you mean Scaleout, I couldn't find any Scaletable
@ I’m sorry I meant skytable