for the last two day's i was watching your content, i genuinely appreciate the time and the effort that you put into this vidoes, your content is amazing and it's well explained, Thank you so much for sharing such a content.
Good question. When you insert, update, or delete data, the database needs to update the index to ensure it matches the related table. This step takes an extra bit of time. Each extra index will slow down these operations. So we only usually add the indexes we need. Unless you’re on a database that is very focused on reading data and does not update data very often, then it may not be a concern.
Thank very much for this constructive demo. I have question: when you indexed the columns, it reduced the total cost. But not the total execution time. Why you prefer reducing the total cost over the execution time which is crucial to for an applications in production ?
Thanks! Good question. I believe it's because the data set was so small and the execution time was small that it didn't really impact the time. On a larger data set you may see a bigger difference in execution time.
In your case separated tables win againts the single table with no indexes, in my case separated tables make more cost since each table has about 40 columns, but the single table on provide the columns that needed for the user, lets said 40x4 is 160 columns but in single table we approximitely only combine 20 columns each. I will try to implement this indexes with my company databases, as we seems need this indexes to be implemented. Working on old databases that been laying around for decades with MyISAM engine with millions of rows and try to make the performance faster as it getting slower every single month. Thanks for the video its really helpful. I will also considering to ask management to migrate to other database engine or even other DBMS like PostgreSQL, using Cache is kinda eating too much memory considering our company budget that run all the apps in one server.
Good point, it also depends on how many columns you need to return. Just because the separate tables have 40 columns, doesn't mean you necessarily need to select all 40 columns. But if you need all these columns, then a single table may make more sense like you are using.
Very well done, sir. I do get asked about the effort made to normalize a database. The insight offered by your very clear explanation will go a long way in helping to answer those queries. Your videos and website are a great resource for the database developer community.
Hi, can you offer us the DDL scripts that you used to set up your example database? Then we can recreate it directly. That would be great. Thanks and regards \sdohn
Good idea! The sample database (olympics) is available on GitHub here: github.com/bbrumm/databasestar/tree/main/sample_databases/sample_db_olympics The script for the queries used in this video is now on GitHub as well: github.com/bbrumm/databasestar/tree/main/videos/100_joins
No I don't think so. However, once your database gets pretty large, you'll be looking into all kinds of techniques to improve performance, and one of them may involve caching or creating summary tables which means fewer joins - but it has tradeoffs.
This is completely wrong. 0) Operating in "less than a second" units for 100-200k tables is like saying this car was not expensive, it's was less than a million dollars. You need to show execution times millis from the query plan (EXPLAIN ANALYZE). 1) You execution time seemed to be 5 times faster for the denormalized query. 2) Joining will ALWAYS be slower than one tabel if you join big tables (sorry but in 2023 few hundred k rows is nothing, even on a local machine). 3) You didn't explain how actually JOIN works behind the scenes but I get it because it would ruin the whole video.
Thanks for the feedback. I wouldn't say it's "completely wrong" because the video demonstrates the concept step-by-step and shows numbers. I can create another video that demostrates this with larger tables, as I think it would be more beneficial. 0) That's a good point, which is why I didn't refer to the time taken when talking about the query, I referred to the cost from the execution plan. For larger tables & longer queries I would have also used the time taken. 1) I don't think it was 5 times faster, the cost comparison (after indexes) was 822 vs 1,239, so it's a little faster. 2) I don't think joining will always be slower than one table if working with big tables. That's the point of this video - joining is not always slower if you have a normalised design and indexes, BUT it depends on the query. 3) I don't need to explain how join works behind the scenes for this video to be useful.
0) they does point that with "cost", as you typed "few k rows is nothing even on local machine", so the execution time will be really thin and can't be use for the actually benchmark since probably other apps consume the gap, as also now days computer has something called multi threading processor. 1) it does, i guessed the first answer could answer the second point too. 2) "always" is not the perfect fit, cause in some cases its actually work as he also provided the cost, unindexed tables is one of the problems. 3) he explain with the flow pgadmin provided, well when people "trying" to explain about something and then it not satisfy you, you will just criticize them?
for the last two day's i was watching your content,
i genuinely appreciate the time and the effort that you put into this vidoes, your content is amazing and it's well explained,
Thank you so much for sharing such a content.
Thanks for the comment! I'm glad you like the videos!
I've always had this question: Does the number of joins affect performance or not? This video answered all my questions.
I’m glad it was helpful!
In this case, why wouldn't I add indexes everywhere if it allows for more performance?
Good question. When you insert, update, or delete data, the database needs to update the index to ensure it matches the related table. This step takes an extra bit of time. Each extra index will slow down these operations. So we only usually add the indexes we need.
Unless you’re on a database that is very focused on reading data and does not update data very often, then it may not be a concern.
Another reason is that each index costs you storage, so you are essentially trading off storage for speed/performance
Thank very much for this constructive demo.
I have question: when you indexed the columns, it reduced the total cost. But not the total execution time. Why you prefer reducing the total cost over the execution time which is crucial to for an applications in production ?
Thanks! Good question. I believe it's because the data set was so small and the execution time was small that it didn't really impact the time. On a larger data set you may see a bigger difference in execution time.
@@DatabaseStarthank you for your response.
In your case separated tables win againts the single table with no indexes, in my case separated tables make more cost since each table has about 40 columns, but the single table on provide the columns that needed for the user, lets said 40x4 is 160 columns but in single table we approximitely only combine 20 columns each. I will try to implement this indexes with my company databases, as we seems need this indexes to be implemented. Working on old databases that been laying around for decades with MyISAM engine with millions of rows and try to make the performance faster as it getting slower every single month. Thanks for the video its really helpful. I will also considering to ask management to migrate to other database engine or even other DBMS like PostgreSQL, using Cache is kinda eating too much memory considering our company budget that run all the apps in one server.
Good point, it also depends on how many columns you need to return. Just because the separate tables have 40 columns, doesn't mean you necessarily need to select all 40 columns. But if you need all these columns, then a single table may make more sense like you are using.
Thank you for this very useful video!
Glad it was helpful!
Are all these concepts applicable on cloud services? Wouldnt there be a difference?
They should apply regardless of where your database is hosted.
Very well done, sir. I do get asked about the effort made to normalize a database. The insight offered by your very clear explanation will go a long way in helping to answer those queries.
Your videos and website are a great resource for the database developer community.
Thanks for the kind words! I'm glad you like the video and my channel.
Hi, can you offer us the DDL scripts that you used to set up your example database? Then we can recreate it directly. That would be great. Thanks and regards \sdohn
Good idea! The sample database (olympics) is available on GitHub here: github.com/bbrumm/databasestar/tree/main/sample_databases/sample_db_olympics
The script for the queries used in this video is now on GitHub as well: github.com/bbrumm/databasestar/tree/main/videos/100_joins
Excellent
Thanks!
Very insightful thanks a lot ❤ what does noc stand for?
Thanks! NOC stands for National Olympic Committee.
Hi,
Thank you for the great insight.
What is the software you use to run the queries and the explain feature?
Thanks! I'm using a tool called pgAdmin, which is a common SQL editor for Postgres databases.
Do you think joins can have impact on scalability? Thanks
No I don't think so. However, once your database gets pretty large, you'll be looking into all kinds of techniques to improve performance, and one of them may involve caching or creating summary tables which means fewer joins - but it has tradeoffs.
Ty
You're welcome
👍
😄
🤔
🙂
now try ordering by something )
We can add an Order By but the point still stands.
its VERY depends on a query itself. joins are not always faster and better. @@DatabaseStar
This is completely wrong.
0) Operating in "less than a second" units for 100-200k tables is like saying this car was not expensive, it's was less than a million dollars. You need to show execution times millis from the query plan (EXPLAIN ANALYZE).
1) You execution time seemed to be 5 times faster for the denormalized query.
2) Joining will ALWAYS be slower than one tabel if you join big tables (sorry but in 2023 few hundred k rows is nothing, even on a local machine).
3) You didn't explain how actually JOIN works behind the scenes but I get it because it would ruin the whole video.
Thanks for the feedback. I wouldn't say it's "completely wrong" because the video demonstrates the concept step-by-step and shows numbers.
I can create another video that demostrates this with larger tables, as I think it would be more beneficial.
0) That's a good point, which is why I didn't refer to the time taken when talking about the query, I referred to the cost from the execution plan. For larger tables & longer queries I would have also used the time taken.
1) I don't think it was 5 times faster, the cost comparison (after indexes) was 822 vs 1,239, so it's a little faster.
2) I don't think joining will always be slower than one table if working with big tables. That's the point of this video - joining is not always slower if you have a normalised design and indexes, BUT it depends on the query.
3) I don't need to explain how join works behind the scenes for this video to be useful.
0) they does point that with "cost", as you typed "few k rows is nothing even on local machine", so the execution time will be really thin and can't be use for the actually benchmark since probably other apps consume the gap, as also now days computer has something called multi threading processor.
1) it does, i guessed the first answer could answer the second point too.
2) "always" is not the perfect fit, cause in some cases its actually work as he also provided the cost, unindexed tables is one of the problems.
3) he explain with the flow pgadmin provided, well when people "trying" to explain about something and then it not satisfy you, you will just criticize them?