This is such a smart hack to be honest. This is one of those hacks id say which on paper sounds pretty crazy , but in practice actually ends up being really efficient. Makes you fathom how efficient postgres really is!
Great talk. Wonder if partition could have been done instead of creating dynamic tables since partition will internally create table(s) anyway but would be more managable.
That was the coolest and smartest talk I have heard, Thank you Zerodha for opening up this hack as a service, we might consider using this in our company.
I can't figure out how did DungBeetle, improve query time for generating reports. Like still the query is executed on huge base table and you are storing the results in a new table in results DB. Your query output is not instant, it still takes time as it did earlier, so what's the point of having 7M tables generated daily?
Yes same is the point I was thinking about...but I think their perspective is make the application free from the overhead or what...I m still confused 😂
I think further querying like sort filter etc will be faster on a dedicated table. Also results database can be designed for performance or querying, while base database has different design considerations.
The problem being solved is not making individual queries faster. But to make the system resilient. Say on a busy day 10x reporting generating traffic comes, existing system will choke as 10x time those large queries will run. But through the async querying the client 'may' get delayed response, but is guaranteed to get one at some point.
the idea isn't to optimize query time,idea is to decrease DB locking AND create a single abstraction layer which is decoupled from the app which can be scaled separately AND allow faster caching with further manipulation and aggregation. This is breaking like every single fundamental of DBMS and computer science in general but is a phenomenal example of product centered engineering lmao
Super interesting talk was completely hooked to it for 20 minutes. Just one thing did you guys explored the approach of storing everything in 1 table, want to know the reason of coming with the idea of storing in a fresh table per use
Few questions (I am a data engineer learning system design, so please be considerate if my questions are naive): 1. 7+ million tables with 1TB of data doesn't seem huge. It's like ~140KB per table or per user report. I was actually shocked with the title (seems like a clickbait :D) 2. But why can't we just use Kafka + stream-processing engine as that middleware. It still helps scale APP and DB independently. And sending messages to Kafka could be avro/protobuf which are more efficient than an API (mostly, JSON?) 3. If I understand correctly, the results DB of postgres is simply a cached person data for quick access throughout the day instead of querying the huge DB again and again! Kinda materalized view in data engineering terms. AM I correct here?
This is what I understood. Please, tell me if my understanding is wrong. And why not Redis in this place? Imagine you have a bunch of information stored in a really big database. When someone wants to get some specific information from that database, it can sometimes take a long time to find it and give it to them. That's where this middleware comes in. It acts like a middleman between the person asking for the information and the big database. Here's how it works: 1. The middleware has some pre-made "tasks" - these are like common questions people might ask, and the middleware already knows how to find the answers for those questions in the big database. 2. When someone asks the middleware a question, it creates a "job" to go find the answer. This job runs in the background, looking up the information in the big database. 3. While the job is running, the middleware stores the results in a separate, faster database. This way, the next time someone asks the same question, the middleware can just pull the answer from the fast database instead of having to search the big one again. 4. When the person asks for the information, the middleware can give them the answer from the fast database right away, without having to wait for the big database to respond. The reason this middleware is useful is that the big database can be really slow, especially if lots of people are asking questions at the same time. By caching the answers in a faster database, the middleware can give people the information they need much more quickly. It's kind of like having a helper that knows where all the important stuff is stored, so they can grab it for you fast, instead of you having to search through everything yourself every time.
@@keenzeen it's simple, write once read a couple times. Not 1000s or 100_000 times. Just chasing for "flashy tech" and burning money for the sake of looking cool or boasting around has no meaning. Think about the cost of using in-memory caches and having a disk attached for backup with in-memory caches 🤦♂ is more idiotic in terms of cost - for them A bit off-topic: In most rdbms cost of deleting data is fragmentation (though they went a step ahead to replace the ebs volume to eliminate this limitation is even better instead of deleting tables/data). Vertical sharding has quirks related to query performances. Having tables is a simple solution that is supporting their business. There is no one size that fits all. Team should do what makes sense based on their collective goal.
Great insight, I got a few doubts watching this video : 1) Is it only one ec2 instance for dungbeetle ? Shouldn't we be having instances in multiple regions to ensure high availability? 2)And in the rare case where a query is made when the instance is rebooting, do you have a retry mechanism there ?
started to contribute? they were always inclined towards foss, especially Kailash, they do have few decent tools open source which they developed for their internal use allready
An interviewer from tcs said “postgreSQL is only used for small databases 🤡” Happened to me in a real interview and I could only hold back my laughter 😂
Postgres is good for large DBs but when it comes to performance like HPC situations, there are certainly better options. Postgres is too slow, especially for the scale at which Zerodha operates.
Ok so here results db is like it is storing all the responses it getting for large db and storing as result in results db for serving the results to client so flow goes like Client request -> server which async queue which store the request then query data from large db in asynchronous manner -> then store response as data in results db for serving So dungbeetle using like local secodary indexing method for finding the users queried id with results which is store in results DB
We have been using Temp/Ephemeral table for Analytics Reports since years. Its just that it does not have a 7+ million figure. Its not a hack, its design. It just sounds cool but pretty generic.
Thank you, Kailash Nadh, for such an insightful talk at IndiaFOSS 2024! The way you tackled the challenges of managing over 7 million Postgres tables and scalable report generation was truly impressive. The introduction of DungBeetle and the concept of independent middleware for handling large queries were game-changers. Your practical approach and innovative solutions are inspiring. Looking forward to exploring more in this space and implementing these ideas!
It seems the financial institution has a wrong partitioning / sharding architecture for their use case. Their approach bound to create data inconsistencies too, i wonder, how do they check consistency of final data, returned to the customers? Institution needs an event processing pipeline for customer reports, after financial transactions are cleared, finalized and sealed.
Great idea, and talk beautifully done but, what happens to stale tables? meaning when we have stale records in a table we could leverage ttl or some delete records approach. Isn't there a use case to "delete old records" tables in this case?What happens when say a user deletes their zerodha account? does ALL their records(and tables created due to their queries) get deleted too?
Very interesting. I have a few questions: If you wipe all report results every night, why even store them? Is the same report generated multiple times a day? Is using pg_hash/JSONB colums not an option because you also need to support other DBMS? This would still allow saving all results into one table, and create a VIEW for each result, so that apps don't have to use WHERE.
reading through these comments i too felt if this is such a big of a thing, or hack or abuse as its called here. another thing, dung-beetle is already used for a cross-browser tool developed way back. i felt a bit of bad taste, duplicating the name for something very different or rather not checking for uniqueness. but at the same time, appreciate the execution/implementation however small the idea is. i would suggest or it would be interesting adding some ML/AI capability to predict what reports are in demand and making it a generic proxy to the reports module of any web app.
Interesting talk, But As they delete the data every day, if I want to look at the report that I generated 1 day ago, I'll have to go through the process again?
@@maheshwarang4000 I understand how cache with TTL works, but if a report takes a long time to generate-for example, on peak days, assuming it could take up to 5 minutes. When users who return later on website or the app (more likely to happen with more wait times) have to go through the same process again? How is this handled in terms of user experience?
@@ossamarafique3003yes that's right the user has to wait again. And this hack is basically for reducing main DB load and just handling the million requests through another layer which is dungbeetle
@@aakashvenkat2733 yeah. possibly because of s3 retreival costs. storage is only for day as you are dropping the db. Let me try to build a calculation and get back. postgres is definitely works for the job as the main operation is around sorting and pagination once the data is retrieved. The new postgres is acting as a cache.
@@_sairam yeah, but idk why they didn’t go with any NoSQL db since they can store unstructured data as json easily. I used a similar architecture for cache in my previous organisation.
@ json is too verbose. If someone wants to optimise, they can do so with inherited tables etc, but that was not the goal I presume. It’s about that solutions can be implemented with very less code to address business problems
Cache is extremely expensive. Also the problem with Redis is you cannot simply just store large objects in it. Out of the blue you may see tail latencies spiraling out of control. Pretty common problem with Redis. Infra teams managing Redis at large companies often advise dev teams to tread cautiously.
redis is optimised for key based lookup... observe the select query where user id along with date range which is not static key.. So it generally need a relational db or flat file processed approach
Dungbeetle service will need access to all the databases of different services. How will a single sql query manage to connect bring data from different databases and create record from it?
They probably don't have a requirement for reports which utilizes multiple data sources. If they did, I think Clickhouse would be a pragmatic way to do it. It lets you create virtual tables for bunch of remote databases like postgres, mongo, mysql etc.
Why wasn’t a noSQL DB considered? It can hold json as documents and each user can be one document. We don’t even need to follow a structure since each document can have different fields altogether
@@akrammohammed3581 can they not store those query results in no SQL considering results are mostly aggregated? That way you don't have do hack of creating throw away tables.. and when same user request the same report again you don't even have to fire those queries which they will be doing in current approach since they just remove them daily.
Most of their queries would be range queries and join queeies. and he didn't say write is the problem. The choice comes down to: - CA(p) - Consistency/Isolation level - Nature of data - Read Write throughput - Available expertise of the team Figma vs Discord database saga is an example.
This was such an insightful video! . It reminded me of a video series I recently made on S3 buckets, where I dive deeper into core concept of S3 like replication versioning,storage classes. If anyone’s interested, feel free to check it out - I’d love to hear your thoughts!
I dont see myself as an expert but in my experience its better to switch to another DB other than Postgres, when performance matters. Postgres is for stability & Integrity, not for caching. Questdb or Clickhouse would have been better.
@@vancedshitt6177 I design & build HPC stuff, I have done sufficient benchmarking with postgres to make that comment. This design by Kailash is very poor. Maybe you should check back on your knowledge, maybe its too generic.
this sucks...that's why zerodha reports take longer and even the portal is quite slow. Why don't guys hire a good architect and yes, he/she will keep the systems simple..we don't have to overcomplicate systems..but this is plain dumb use of DB. Secondly, it's not a lot of data & users; hence, it's working fine for you. If you have a lot of records, your RAM will not be enough ; that's when you will have to shard it.
The only true solution/take back in this video is the fact that they can delete/reset an empty DB by re-attaching and empty volume. Its more like a testament for amazon/cloud tools than Postgres. Funny that is a FOSS conference.
Felt Hacky and it works for now but question is how long this will be sustainable!!! considering solution is mostly focused on large number of "User reports" which may or may not have similar schemas it does makes sense to save results of query until request comes back. Other solution could be saving those sql results in No sql as documents and populate those reports. The way they shut off postgre every day remove disk and all.... Just for removing those tables.... we expect better things from you guys given you are no longer start up but largest stock broker of country that has growing Dmat user base.
This hack is hot garbage. These people don’t know how to properly use Postgres (and their tools and their time). Nobody should be taking notes from this presentation.
yeah the fact that postgres can handle 7million tables is impressive but I'm VERY skeptical that this was the best solution they could've used to solve the problem even if it happens to work for now.
Thank you India FOSS for bringing talks from India product companies.
This is such a smart hack to be honest. This is one of those hacks id say which on paper sounds pretty crazy , but in practice actually ends up being really efficient. Makes you fathom how efficient postgres really is!
Want more insightful tech talks like this!
Really interesting hack! Thanks for sharing the tips. :)
Great talk.
Wonder if partition could have been done instead of creating dynamic tables since partition will internally create table(s) anyway but would be more managable.
Partitions will essentially be the same thing in this case.
7 million tables sounds a lot cooler i guess 😜
That was the coolest and smartest talk I have heard, Thank you Zerodha for opening up this hack as a service, we might consider using this in our company.
Well explained sir!👍
very interesting hack, really shows off the power of postgres
Mind Blowingg!
This is awesome ❤
7 million TABLES, not rows? Yooooo
Yooo
100 billion rows. Yooo
It's really interesting
I can't figure out how did DungBeetle, improve query time for generating reports. Like still the query is executed on huge base table and you are storing the results in a new table in results DB. Your query output is not instant, it still takes time as it did earlier, so what's the point of having 7M tables generated daily?
Yes same is the point I was thinking about...but I think their perspective is make the application free from the overhead or what...I m still confused 😂
I think further querying like sort filter etc will be faster on a dedicated table. Also results database can be designed for performance or querying, while base database has different design considerations.
The problem being solved is not making individual queries faster. But to make the system resilient. Say on a busy day 10x reporting generating traffic comes, existing system will choke as 10x time those large queries will run. But through the async querying the client 'may' get delayed response, but is guaranteed to get one at some point.
Listen it didn't down the main server with more request and app is not communicating directly with large slow database
the idea isn't to optimize query time,idea is to decrease DB locking AND create a single abstraction layer which is decoupled from the app which can be scaled separately AND allow faster caching with further manipulation and aggregation.
This is breaking like every single fundamental of DBMS and computer science in general but is a phenomenal example of product centered engineering lmao
Super interesting talk was completely hooked to it for 20 minutes. Just one thing did you guys explored the approach of storing everything in 1 table, want to know the reason of coming with the idea of storing in a fresh table per use
You can also used dungbeetle with s3 instead of postgres
Few questions (I am a data engineer learning system design, so please be considerate if my questions are naive):
1. 7+ million tables with 1TB of data doesn't seem huge. It's like ~140KB per table or per user report. I was actually shocked with the title (seems like a clickbait :D)
2. But why can't we just use Kafka + stream-processing engine as that middleware. It still helps scale APP and DB independently. And sending messages to Kafka could be avro/protobuf which are more efficient than an API (mostly, JSON?)
3. If I understand correctly, the results DB of postgres is simply a cached person data for quick access throughout the day instead of querying the huge DB again and again! Kinda materalized view in data engineering terms. AM I correct here?
This one is interesting 😮😮
Very well explained!
This is what I understood. Please, tell me if my understanding is wrong. And why not Redis in this place?
Imagine you have a bunch of information stored in a really big database. When someone wants to get some specific information from that database, it can sometimes take a long time to find it and give it to them.
That's where this middleware comes in. It acts like a middleman between the person asking for the information and the big database.
Here's how it works:
1. The middleware has some pre-made "tasks" - these are like common questions people might ask, and the middleware already knows how to find the answers for those questions in the big database.
2. When someone asks the middleware a question, it creates a "job" to go find the answer. This job runs in the background, looking up the information in the big database.
3. While the job is running, the middleware stores the results in a separate, faster database. This way, the next time someone asks the same question, the middleware can just pull the answer from the fast database instead of having to search the big one again.
4. When the person asks for the information, the middleware can give them the answer from the fast database right away, without having to wait for the big database to respond.
The reason this middleware is useful is that the big database can be really slow, especially if lots of people are asking questions at the same time. By caching the answers in a faster database, the middleware can give people the information they need much more quickly.
It's kind of like having a helper that knows where all the important stuff is stored, so they can grab it for you fast, instead of you having to search through everything yourself every time.
abe bodam 🤣
@@vancedshitt6177 😂that’s why i wrote my understanding. Please help me . I would appreciate it
@@keenzeen it's simple, write once read a couple times. Not 1000s or 100_000 times. Just chasing for "flashy tech" and burning money for the sake of looking cool or boasting around has no meaning. Think about the cost of using in-memory caches
and having a disk attached for backup with in-memory caches 🤦♂ is more idiotic in terms of cost - for them
A bit off-topic:
In most rdbms cost of deleting data is fragmentation (though they went a step ahead to replace the ebs volume to eliminate this limitation is even better instead of deleting tables/data). Vertical sharding has quirks related to query performances.
Having tables is a simple solution that is supporting their business.
There is no one size that fits all. Team should do what makes sense based on their collective goal.
Great insight, I got a few doubts watching this video :
1) Is it only one ec2 instance for dungbeetle ? Shouldn't we be having instances in multiple regions to ensure high availability?
2)And in the rare case where a query is made when the instance is rebooting, do you have a retry mechanism there ?
that's why zerodha lags, imagine having one instance for postgres db 128gb , 64vcpu
Why not dropping a parquet or CSV to S3? Isn’t that cheaper than EC2?
Instance reboots at night, so it’s usually down then. Makes sense since hardly anyone uses it at that time and it doesn’t affect business. Smart move
Amazing. It is good to see that Zerodha started contributing to FOSS.
started to contribute?
they were always inclined towards foss, especially Kailash, they do have few decent tools open source which they developed for their internal use allready
@@FakeDumbDummy I know that but, I am FOSS fund by Zerodha
nice, interesting way to remove bottleneck from the base table
An interviewer from tcs said “postgreSQL is only used for small databases 🤡”
Happened to me in a real interview and I could only hold back my laughter 😂
Postgres is good for large DBs but when it comes to performance like HPC situations, there are certainly better options. Postgres is too slow, especially for the scale at which Zerodha operates.
Ok so here results db is like it is storing all the responses it getting for large db and storing as result in results db for serving the results to client so flow goes like
Client request -> server which async queue which store the request then query data from large db in asynchronous manner -> then store response as data in results db for serving
So dungbeetle using like local secodary indexing method for finding the users queried id with results which is store in results DB
We have been using Temp/Ephemeral table for Analytics Reports since years. Its just that it does not have a 7+ million figure. Its not a hack, its design. It just sounds cool but pretty generic.
Considering your queries are pre-determined, have you considered an IVM system like Materialize/Feldera and what would be the challenges?
Thank you, Kailash Nadh, for such an insightful talk at IndiaFOSS 2024! The way you tackled the challenges of managing over 7 million Postgres tables and scalable report generation was truly impressive. The introduction of DungBeetle and the concept of independent middleware for handling large queries were game-changers. Your practical approach and innovative solutions are inspiring. Looking forward to exploring more in this space and implementing these ideas!
It seems the financial institution has a wrong partitioning / sharding architecture for their use case. Their approach bound to create data inconsistencies too, i wonder, how do they check consistency of final data, returned to the customers? Institution needs an event processing pipeline for customer reports, after financial transactions are cleared, finalized and sealed.
This is why zerodha is so cool. Like many people making product dont care about performance.
Great idea, and talk beautifully done but, what happens to stale tables? meaning when we have stale records in a table we could leverage ttl or some delete records approach. Isn't there a use case to "delete old records" tables in this case?What happens when say a user deletes their zerodha account? does ALL their records(and tables created due to their queries) get deleted too?
Very interesting. I have a few questions:
If you wipe all report results every night, why even store them? Is the same report generated multiple times a day?
Is using pg_hash/JSONB colums not an option because you also need to support other DBMS? This would still allow saving all results into one table, and create a VIEW for each result, so that apps don't have to use WHERE.
That is crazy
incredible
reading through these comments i too felt if this is such a big of a thing, or hack or abuse as its called here. another thing, dung-beetle is already used for a cross-browser tool developed way back. i felt a bit of bad taste, duplicating the name for something very different or rather not checking for uniqueness. but at the same time, appreciate the execution/implementation however small the idea is. i would suggest or it would be interesting adding some ML/AI capability to predict what reports are in demand and making it a generic proxy to the reports module of any web app.
Interesting talk, But As they delete the data every day, if I want to look at the report that I generated 1 day ago, I'll have to go through the process again?
thats how most of the inmemory cache works with TTL
@@maheshwarang4000 I understand how cache with TTL works, but if a report takes a long time to generate-for example, on peak days, assuming it could take up to 5 minutes. When users who return later on website or the app (more likely to happen with more wait times) have to go through the same process again? How is this handled in terms of user experience?
@@ossamarafique3003yes that's right the user has to wait again. And this hack is basically for reducing main DB load and just handling the million requests through another layer which is dungbeetle
Would s3 with parquet format work? Assuming 300MB on average
Too costly
@@aakashvenkat2733 yeah. possibly because of s3 retreival costs. storage is only for day as you are dropping the db. Let me try to build a calculation and get back.
postgres is definitely works for the job as the main operation is around sorting and pagination once the data is retrieved. The new postgres is acting as a cache.
@@_sairam yeah, but idk why they didn’t go with any NoSQL db since they can store unstructured data as json easily. I used a similar architecture for cache in my previous organisation.
@ json is too verbose. If someone wants to optimise, they can do so with inherited tables etc, but that was not the goal I presume. It’s about that solutions can be implemented with very less code to address business problems
Glad to hear from a GOAT like Kailash. Why was the choice made to use postgres instead of a distributed caching layer like Redis/memcache?
Yeah I am also thinking the same 🤔
Can we run sql query in redis? Because in this data will not same all users. @rohithjanardhan4970
Cache is extremely expensive. Also the problem with Redis is you cannot simply just store large objects in it. Out of the blue you may see tail latencies spiraling out of control. Pretty common problem with Redis. Infra teams managing Redis at large companies often advise dev teams to tread cautiously.
redis is optimised for key based lookup... observe the select query where user id along with date range which is not static key.. So it generally need a relational db or flat file processed approach
Dungbeetle service will need access to all the databases of different services. How will a single sql query manage to connect bring data from different databases and create record from it?
They probably don't have a requirement for reports which utilizes multiple data sources.
If they did, I think Clickhouse would be a pragmatic way to do it. It lets you create virtual tables for bunch of remote databases like postgres, mongo, mysql etc.
That sounds like a job for sqlite - one file per report?
Why wasn’t a noSQL DB considered? It can hold json as documents and each user can be one document. We don’t even need to follow a structure since each document can have different fields altogether
They have to analyticals and more over migrations are hard
The data majorly related to other data that's why they use sql
@@akrammohammed3581 can they not store those query results in no SQL considering results are mostly aggregated? That way you don't have do hack of creating throw away tables.. and when same user request the same report again you don't even have to fire those queries which they will be doing in current approach since they just remove them daily.
@@akrammohammed3581 as per kailash himself, they have written a service which can join data from different databases together
Most of their queries would be range queries and join queeies. and he didn't say write is the problem.
The choice comes down to:
- CA(p)
- Consistency/Isolation level
- Nature of data
- Read Write throughput
- Available expertise of the team
Figma vs Discord database saga is an example.
But why not use Redis Cache with Redis Search ?
2 TB in cache memory , that is RAM ?. Redis / cache DBs are for low latency operations. It’s an Async operation which requires huge storage space .
DuckDB could also be tried
DuckDB with Apache Arrow would work nice here.
Chai aur code lands me here❤❤❤
What you're describing in this whole talk, isn't that just MATERIALIZED VIEWS? (assuming the reports are generated from other data from the same DB)
This was such an insightful video! . It reminded me of a video series I recently made on S3 buckets, where I dive deeper into core concept of S3 like replication versioning,storage classes. If anyone’s interested, feel free to check it out - I’d love to hear your thoughts!
A table for every user?
Why not use redis cache?
2 TB in cache memory , that is RAM ?. Redis / cache DBs are for low latency operations. It’s an Async operation which requires huge storage space .
I dont see myself as an expert but in my experience its better to switch to another DB other than Postgres, when performance matters. Postgres is for stability & Integrity, not for caching. Questdb or Clickhouse would have been better.
you certainly lack knowledge.
LOL
@@vancedshitt6177 abe bodam your shitt6177 comment doesnt mean anything if you dont elaborate what is wrong in my approach.
@@vancedshitt6177 I design & build HPC stuff, I have done sufficient benchmarking with postgres to make that comment. This design by Kailash is very poor. Maybe you should check back on your knowledge, maybe its too generic.
@@paddor LOLWA
Thats simply scaling materialized-views
this sucks...that's why zerodha reports take longer and even the portal is quite slow.
Why don't guys hire a good architect and yes, he/she will keep the systems simple..we don't have to overcomplicate systems..but this is plain dumb use of DB.
Secondly, it's not a lot of data & users; hence, it's working fine for you.
If you have a lot of records, your RAM will not be enough ; that's when you will have to shard it.
The only true solution/take back in this video is the fact that they can delete/reset an empty DB by re-attaching and empty volume. Its more like a testament for amazon/cloud tools than Postgres.
Funny that is a FOSS conference.
13:02 and all Ive learned till now is that we have 100s of millions of lines of reports of financial... bhai kaise karna hai ye bhi bata de
Felt Hacky and it works for now but question is how long this will be sustainable!!! considering solution is mostly focused on large number of "User reports" which may or may not have similar schemas it does makes sense to save results of query until request comes back. Other solution could be saving those sql results in No sql as documents and populate those reports.
The way they shut off postgre every day remove disk and all.... Just for removing those tables.... we expect better things from you guys given you are no longer start up but largest stock broker of country that has growing Dmat user base.
If a company has 7 m which means 70 lakh tables you are doing something wrong
This hack is hot garbage. These people don’t know how to properly use Postgres (and their tools and their time). Nobody should be taking notes from this presentation.
7M tables..poor design without any doubt. Sorry.
i think he meant records, no way they got 7m tables lmao
oh wait holy shit, he is creating new table for each job id in resultsdb
yeah the fact that postgres can handle 7million tables is impressive but I'm VERY skeptical that this was the best solution they could've used to solve the problem even if it happens to work for now.
I too think this is a poor design, but from a overall architecture point of view, not about schema.
most basic presentation of all... stop indian companies from delivering presentatiom
what is your problem exactly?
@@snehanshuphukon728 lmfao