Man, you are genius. I have been in IT for 15 years and hardly know 10% of what you know. Your explanation- how you touch basics and build on top of that is simply amazing. I really like how you are emphasizing conceptual knowledge. You rock.
Thank you sir. I've worked with relational databases all my life. Since Cassandra started to be a trend back in the days, i was having a hard time to understand how exactly the concept of column family databases really works. Until today. Thank you very much.
Have watched several of these explanations. I feel your use of the white board stands out. Clear consistent and detailed explanation as well. Hat off :-) .
Thanks a ton ton.. I have been looking for coloumn based db and how its better. This vid explained everything.. Thanks a lot ..Your every video give so much knowledge which cant be found under one video with explanation in simple way.. Thanks a lot narender for taking out time and making vidoes..
I liked the topics getting covered by u. This channel is really helping me to understand system design concepts used in real world. Also it will be great if u cover the internals of a message queue.
Nice blog with very clear explanations for row and columnar db storage patterns. Could you also blog the in memory data storage in both the cases rather than the HDD. I am not interested in seeing the recent HANA buzz around the ‘in memory’ data storage and retrieval. One more thing, in this blog you have mentioned about the scenarios that dictates how the data will be stored like transactional or analytical..but in HANA,I think it stores everything in columnar way, so are there any drawbacks in that? Appreciate your innate insight in this.
Wow.. so far this is the best explanation I have seen. Thank you so much. I have one query, I am still not able to understand, how insertion happens in columnar database, if, there is no space left in a block and next block has data of another column. It will be great if you put some light on the this. Appreciate your help.
Great video. Thanks for making it. It would be great if you can create more videos on the internals of databases, focusing each vdo on a different kind of database. Thanks!
In your example, how is the following requirement met, all the response times between 10 am and 10:30am...I understand all the times are stored together in a series of one or more blocks and all the response times are stored together, the same way. So how are the time range matched with the response time range and returned back? In other words, since the time blocks are stored separately and response times are stored separately, who does the job of saying 10 and 10:30 am matches to 200, 300 and 350 ms of response times?
Actually along with the data you will always associate the ID, that way first you query all the time and know the ID's that you need to query after that you can query another column like response times or error count. Thing to remember (it stores ID as well along with the data).
@@quizforces i dont thin that how it will work. What i found out later is that the first time in my query response will match the 1st response and so on......
Almost all of the row vs column store explanations on the internet do not explain this point well, but they think they are doing a wonderful job of explaining the concepts to the audience but just fail to explain this simple point. LOL! I think what Ritesh Malav explained below makes sense that the ID is also stored together in the column store so that the individual data values are married together to make sense as meaningful piece of record instead of isolation.
@@RajaKumar-sj3et Completely agree with you & Ritesh. Logically each column should have ID attached to it. But that crops up other questions. 1. How come column encoding works if ID is attached ? 2. How RLE encoding distinguishes columns of different records ? 3. In most of the columnar DB's (like Redshift, Vertica), we don't mention primary keys - Then how does the columns of a single row gets tied up ? I was waiting for these explanations in this (nth) video but no luck !!
Great video. I have 2 questions. 1) What will happen if we usually fetch 2 columns at the same time? Like the time and number of requests in your example? 2) Which database has better performance with DML operations like update and delete? You talked about inserts - can you please talk about updates/deletes also please?
Hey Narendra, this is very helpful content and your voice is very relaxing. Just want to add a suggestion, I think we should have included: how data from one column is matched(linked) to data from another column, like how does db identifies what are the "number-of-requests" between time windows of 10.01 to 10.11.
the record id is always stored, so when you do the time query, you are getting from the time column all the record ids that matches the time requirement. Then it goes to the metric column you need. It reads all of them which might be partitioned, and then fetch data that has the same record ids
How do you large and hot partitions in Columnar DB, since one column could contain data that is huge (Large strings) or hot (More frequently accessed than others)
Hi Narendra; You explained extremely well how we can choose the best-fit NoSql database based on our requirements. Do we have any other factor to decide best-fit NoSql database like, in case our application is write oriented or read oriented..I know the CAP theorem is another factor to decide best-fit NoSql db but wish to know other possible factors.
Question: Incase of column oriented writes, what if current block gets fulled & next consecutive block is also fulled. So, does it shifts all the blocks or how does it actually work & maintain all the blocks at different places for a particular column ?
Thanks bro for very amazing explanation. but couple of questions. 1. How would this work in case of SSD devices? & When columns data updated, how it refer the updated values somewhere else for same column coz, it is very common to update the column data as well in analytics applications as new data keeps coming??
So basically how are these Column data across the blocks linked... for example the traffic and errors need to be plotted with the time. So there is a relationship that needs to be maintained with some kind of a key right? How is that being done?
Thank you for all these amazing videos on systems design. We have learned great concepts from you sir. One request from my side. Would it possible for you to make a series on "object oriented design interview". Any resources you would recommend for the same?
very nice explanation . Thanks , I have been following your videos , Great effort . Please also suggest how do u prepare for these topics ,I mean what resources do you use .. that would be also great help !!! :-)
I'm under the impression the Data Load times for a Data Warehouse is critical ... if a table has 50 columns by your definition there is 50X extra IO... is this really the case and the load times are that much slower?
Man, you are genius. I have been in IT for 15 years and hardly know 10% of what you know. Your explanation- how you touch basics and build on top of that is simply amazing. I really like how you are emphasizing conceptual knowledge. You rock.
Hands down, the best explaination available, covering all the technical aspects. Genius!
So beautifuly explained
Thank you sir. I've worked with relational databases all my life.
Since Cassandra started to be a trend back in the days, i was having a hard time to understand how exactly the concept of column family databases really works.
Until today.
Thank you very much.
very clear explanation of row-oriented and column-oriented database. Thank you very much
You are a genius. I loved the way you explain the topic and how you relate the problem
Have watched several of these explanations. I feel your use of the white board stands out. Clear consistent and detailed explanation as well. Hat off :-) .
Both of your English and explanation are so clear. Thank you so much, bro!
Fantastic. Very good explanation about Row Vs Column db's. Thanks Naren
very good, the best explanation I have seen so far.
You got great grip on the topic and deep understanding of data storage on the disk; simply you are the best and Great!
You have explained the concept at the disk level, awesome work.
Cool explanation...Need more of your videos....Gonna check your whole playlist....
Thanks a ton ton.. I have been looking for coloumn based db and how its better. This vid explained everything.. Thanks a lot ..Your every video give so much knowledge which cant be found under one video with explanation in simple way.. Thanks a lot narender for taking out time and making vidoes..
wow very good explanation
Good one Naren, thank you for posting this.
This is best explanation i have come across ! lucid ! swell !
Nicely explained.. Thank you Narendra!
Amazing video, I loved it.
Quite good explanation.
A brilliantly explained video. Thank you!
Woww..!! What an explanation..!!..It was really helpful...thanks..!!😇
@narendra , good concise one. wish u do more . thanks
Superb explanation Narendra. Really understand the concepts
Detailed explanation. Thank you so much. All my doubts got cleared.
good video
Starting to get back into to database work, this was an excellent video!
Excellent Explanation.
You are back, thanks
Very good explanation
It was crisp and clear. Thank you so much for this awesome content 🥰
Thanks for the video! Very clear and concise explanation. 👍🏻
Great Video thank you for sharing your knowledge.
Very nice explanation.
you're a great teacher!!
I don't think anyone can explain better. Wow.
I liked the topics getting covered by u. This channel is really helping me to understand system design concepts used in real world.
Also it will be great if u cover the internals of a message queue.
Good explaination
Awesome explanation ..Keep it up
Really awesome.well explained 👏👏👏
Brilliant Explanation. Thanks Bro.
nicely explained
very clear understanding sir.
loved the way you explained
Thanks man, great explanation! would be awesome if you could do the same with document-oriented db!
+1 , would like to understand how you write queries for (row vs column vs document)
+ 1
Great video, it helped me to understand my course from my university and helped me for my final exam, thank you
This is wonderful and thanks very much. This is my first time ever commenting on a video lol .
it's so clear. thank you so much.
Great explanation 👏 keep it coming
thank bro for sharing
Great explanation, thank you!
Dude you are great. Very nicely explained!
Thanks man. You explain it really well
Nice blog with very clear explanations for row and columnar db storage patterns. Could you also blog the in memory data storage in both the cases rather than the HDD. I am not interested in seeing the recent HANA buzz around the ‘in memory’ data storage and retrieval. One more thing, in this blog you have mentioned about the scenarios that dictates how the data will be stored like transactional or analytical..but in HANA,I think it stores everything in columnar way, so are there any drawbacks in that? Appreciate your innate insight in this.
really very good explanation. Would like to see video on partitioning in future.
Good.
nice video thanks
Wow.. so far this is the best explanation I have seen. Thank you so much.
I have one query, I am still not able to understand, how insertion happens in columnar database, if, there is no space left in a block and next block has data of another column. It will be great if you put some light on the this. Appreciate your help.
great
Great video. Thanks for making it. It would be great if you can create more videos on the internals of databases, focusing each vdo on a different kind of database.
Thanks!
Awesome video ❤️
Thank you very much bro. You have explained it very clearly and at a beginner level. It helped me in while learning SAP HANA as well.
Sincerely... good effort!!! 👍
Take care man. Situation in Germany also not good due Coronavirus. Stay safe.
Sure, u too :)
Well explained. Thank you
You are a genius dude! These little things get people 300-400K jobs in FAANGMULA!
Got to knw much frm ur session. Thanks alot
In your example, how is the following requirement met, all the response times between 10 am and 10:30am...I understand all the times are stored together in a series of one or more blocks and all the response times are stored together, the same way. So how are the time range matched with the response time range and returned back? In other words, since the time blocks are stored separately and response times are stored separately, who does the job of saying 10 and 10:30 am matches to 200, 300 and 350 ms of response times?
I was wondering the same thing.
Actually along with the data you will always associate the ID, that way first you query all the time and know the ID's that you need to query after that you can query another column like response times or error count. Thing to remember (it stores ID as well along with the data).
@@quizforces i dont thin that how it will work. What i found out later is that the first time in my query response will match the 1st response and so on......
Almost all of the row vs column store explanations on the internet do not explain this point well, but they think they are doing a wonderful job of explaining the concepts to the audience but just fail to explain this simple point. LOL! I think what Ritesh Malav explained below makes sense that the ID is also stored together in the column store so that the individual data values are married together to make sense as meaningful piece of record instead of isolation.
@@RajaKumar-sj3et Completely agree with you & Ritesh. Logically each column should have ID attached to it. But that crops up other questions.
1. How come column encoding works if ID is attached ?
2. How RLE encoding distinguishes columns of different records ?
3. In most of the columnar DB's (like Redshift, Vertica), we don't mention primary keys - Then how does the columns of a single row gets tied up ?
I was waiting for these explanations in this (nth) video but no luck !!
It will be nice to see a video on write ahead logs
I am your fan
Thats a great video. One question- How does indexing work in column oriented ?
An index is nothing but a column or a group of columns in a database. In a Column oriented database, you don't need an index. That is the beauty.
amazing explanation! :)
Great video. I have 2 questions. 1) What will happen if we usually fetch 2 columns at the same time? Like the time and number of requests in your example? 2) Which database has better performance with DML operations like update and delete? You talked about inserts - can you please talk about updates/deletes also please?
Hey Narendra, this is very helpful content and your voice is very relaxing. Just want to add a suggestion, I think we should have included: how data from one column is matched(linked) to data from another column, like how does db identifies what are the "number-of-requests" between time windows of 10.01 to 10.11.
the record id is always stored, so when you do the time query, you are getting from the time column all the record ids that matches the time requirement. Then it goes to the metric column you need. It reads all of them which might be partitioned, and then fetch data that has the same record ids
How do you large and hot partitions in Columnar DB, since one column could contain data that is huge (Large strings) or hot (More frequently accessed than others)
Hi Narendra; You explained extremely well how we can choose the best-fit NoSql database based on our requirements. Do we have any other factor to decide best-fit NoSql database like, in case our application is write oriented or read oriented..I know the CAP theorem is another factor to decide best-fit NoSql db but wish to know other possible factors.
Question: Incase of column oriented writes, what if current block gets fulled & next consecutive block is also fulled. So, does it shifts all the blocks or how does it actually work & maintain all the blocks at different places for a particular column ?
Thanks bro for very amazing explanation. but couple of questions. 1. How would this work in case of SSD devices? & When columns data updated, how it refer the updated values somewhere else for same column coz, it is very common to update the column data as well in analytics applications as new data keeps coming??
Very clearly explained. Thanks a lot. Can you make videos on Amazon redshift.
does does a columnar database handle multiple where requests? Does it keep track of the subset at each fliter?
I just laughed at the scalar academy ad which came during the video.
So basically how are these Column data across the blocks linked... for example the traffic and errors need to be plotted with the time. So there is a relationship that needs to be maintained with some kind of a key right? How is that being done?
Now i understood why my architect always asks not to do any analytics query on transactional DB :)
perfect ....
Thank you for all these amazing videos on systems design. We have learned great concepts from you sir.
One request from my side. Would it possible for you to make a series on "object oriented design interview". Any resources you would recommend for the same?
It doesn't have to be contiguous always, blocks are distributed and memory allocation starts from different blocks and they are tracked by FS
very nice explanation . Thanks , I have been following your videos , Great effort . Please also suggest how do u prepare for these topics ,I mean what resources do you use .. that would be also great help !!! :-)
Thanku 😊
When one of the disk of shards in a database (either row oriented or column oriented) is full, what are the strategies to mitigate the situation?
I was looking for row vs column db and got very useful info.thanks.Could you please explain in terms of MPP and Cloud DWH if possible..😊
Is it fair to say column oriented for OLAP vs row oriented for OLTP
so it means writes are slower in columnar DB's compared to row oriented DB'S ?
Thanks a lot !!
I'm under the impression the Data Load times for a Data Warehouse is critical ... if a table has 50 columns by your definition there is 50X extra IO... is this really the case and the load times are that much slower?
Hi, Great videos! Please also let us know the source of your knowledge? any books you recommend if someone wants to deep dive into the concepts.
Thank you!
Thanks
how does it store null values for any given row in CF db?
I guess when storing column based data, the ID column should be saved as well?
Do you have notes for this video?
every database have the ability to store row wise and column wise both?OR Different database works differently?
It's different use cases and different databases.
Row oriented: MySQL postgres
Col oriented: Cassandra, mriadb, sap Hana, hbase
@@TechDummiesNarendraL Thanks Man,Appreciate your Hardwork...