Learn the fundamentals of database systems to understand and build performant backend apps Check out my udemy Introduction to Database Engineering husseinnasser.com/courses
My answer to "what's going on guys" is, It's my first day on the job, I got an Internship in a startup company, BIG THANKS to you sir, because of you I became passionate about Backend Engineering, so, I'm doing great and I'm grateful. Bless you and your family
If you enjoyed this video consider checking out my Introduction to Database Engineering course. Its a bestseller on udemy with over 14 hours worth of content and students love it ! husseinnasser.com/courses
Hello Hussein , I am working with Couchbase Database which can update the indexes asynchronously , so even if you have indexes insertion will be still O(1) , however it will take time for the index tree to be re-built but it will be re-balanced eventually .
You got a good point Hussein. Partitioning/Sharding does not make sense for most of the use cases. Where it makes sense as you said is when you have Billions of Rows and Terabytes of data (which only Big Enterprises do). I work for a Big ECommerce Company Headquartered in Silicon Valley and before migrating to Cloud, we were using Oracle DB with Sharding. We were using several physical servers where we had the UserId going through a "Hashing" Function which gave us the number of the shard. Based on our query "patterns" analysis, we decided that we can go to "DynamoDB" (NoSQL) and we moved the data there in a single table. DynamoDB Internally uses partitioning/sharding itself, but this is not visible to the end user, which means that the complexity of that gets offloaded to AWS.
I think using hash Partitioning is a good idea if your table has the potential of adding up more rows on the client growth for example a URL shorter table.
I want to write my own database engine. Only update the index for crying out loud, if and only iff you do a select query. Or you can have a sub index as a temporary thing then you merge after a while
I see where your going but I don’t know if that will work. Your select will be slower and more importantly it will have to acquire row exclusive locks on the parts of the index that needs to get updated. This will slow down things even more because of waiting events.
Great video as always! 👍 I’ve got a question: 🤨 I have a set of words 🤷♂️. (behave today finger ski upon boy assault summer exhaust beauty stereo over). What should I do with this? 🤷♂️
Thanks for the video. The first statement is not totally right, inserts with lot of constraints and foreign keys with hundred or millions can be also slow.
Thanks for the breakdown! Could you help me with something unrelated: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
Spoke about that before, I prefer raw SQL where I have full control, ORM can introduce extra layer of complexity and invisibility to what really is happening..
Hey Hussein. I have a question. If we have a clustered index for email column or any other non auto incremented column in MySQL then the write operation will need to find the correct place to insert the row. Is it correct? Is so then I think that's another reason to use auto incremented primary keys for the tables.
That is correct, the moment you cluster on a "random" field like email or UUID, you risk slow down inserts because the db need to find where to insert this new row so it fits the clustered page. You also risk something called page splits which drastically decrease performance. unclustered tables don't have this problem, everything goes in the end unsorted.
I want to ask following this question, does it mean to create PK based on some information from the record would help in performance of querying For example Inserting a new subscription record in database we can create pk something like this SUB_02022024_HASH does this will help in querying at the cost time the server will take to create this PK?
Hussein bro please make a video on ,everything about live video streaming app. how can we implement it and what would be the best approach and which data base would be the best. for both live video streaming and normal video streaming like twitch and netflix . please explain about the complete backend architecture with example thank you. i am pretty sure it would help a lot of people. theres not much insightful video out there about this topic on youtube
Inserting a single row where all values are in the statement is simple doesn’t have performance impact (vanilla insert) However you can do an insert that queries the table (insert into select) that has huge impact on the size of the table
hussein , is it possible to update indexes asynchronously or it is set by default synchronous and cant be changed ? i understand the drawbacks of this , but is this even possible ?
Technically it’s possible of course but I am not sure if any database engine exposes such feature. Its even more efficient because you can batch those index updated and do them at once. As long as the engineer understands the consequences its a complex thing to implement though
is this the same lsm tree as the one on Wikipedia? that one was invented in 1991 an published in a paper in 1996. seems like it can't be a google invention then. love your videos.
Hey Hussein I have a query. Say I'm building a app. There are many topics and those topics contain different content . The user subscribe to a topic now I only want to fetch the stuff that user doesn't have means newly added content how can that be done? The way I think it is to compare array of content ids(on user and inside topic table) by length for each topic and then fetch the remaining content but this sounds very bad due to multiple query per every user. Im using mongoDB which does support look ups to different tables should that be needed
@@hnasr so basically I'm new to this stuff so I was thinking this whole method itself is slow but as u say I need to use this only right I tried for days but I couldn't find a better solution, even tried aggregation pipelines but all require me to do multiple queries and finding diffrence of arrays
@@hnasr I have seen ur videos on all those systems but as u say I wanted to keep it as simple as I can so I thought to myself the method to query itself is soo bad I must try redesigning but I can't get a better idea on model I thought of doing some courses on data modelling 😂
Also one more follow up question the apps supposed to inform users about new contents as notification websockets would be expensive for my app(I think) so I thought about short polling every hr(set of users come and ask in 1hr and then another set comes) or something by doing some distribution of users with latitude and time zone on client (no idea how to do it) just a thought yet Thanks for ur replies and time ❤️
@@tikz.-3738 queuing system (kafka, rabbitmq) or periodic polling by clients are possible. I guess if you already have an existing app and want to add near time functionality then adding polling is easier. but this will mean you have periodic requests for each online user. you have to weigh the pros and cons of the two approaches.
Learn the fundamentals of database systems to understand and build performant backend apps
Check out my udemy Introduction to Database Engineering
husseinnasser.com/courses
summary:
"only solve problems that you have, don't solve the problem you don't have"
Thanks for doing this video Hussein. As a new backend dev, I've always wondered about this and no one has answered it in detail as you have.
🍑 Hussein 🍑 Hussein 🍑 Hussein 🍑 Hussein
My answer to "what's going on guys" is, It's my first day on the job, I got an Internship in a startup company, BIG THANKS to you sir, because of you I became passionate about Backend Engineering, so, I'm doing great and I'm grateful. Bless you and your family
If you enjoyed this video consider checking out my Introduction to Database Engineering course. Its a bestseller on udemy with over 14 hours worth of content and students love it !
husseinnasser.com/courses
Hello Hussein , I am working with Couchbase Database which can update the indexes asynchronously , so even if you have indexes insertion will be still O(1) , however it will take time for the index tree to be re-built but it will be re-balanced eventually .
Interesting Hazem! Thanks for sharing
You got a good point Hussein. Partitioning/Sharding does not make sense for most of the use cases. Where it makes sense as you said is when you have Billions of Rows and Terabytes of data (which only Big Enterprises do). I work for a Big ECommerce Company Headquartered in Silicon Valley and before migrating to Cloud, we were using Oracle DB with Sharding. We were using several physical servers where we had the UserId going through a "Hashing" Function which gave us the number of the shard. Based on our query "patterns" analysis, we decided that we can go to "DynamoDB" (NoSQL) and we moved the data there in a single table. DynamoDB Internally uses partitioning/sharding itself, but this is not visible to the end user, which means that the complexity of that gets offloaded to AWS.
Thanks for sharing ! Great use case
Just purchased the course... I love your contents honestly... They are beautiful
Thank you so much ❤️
I think using hash Partitioning is a good idea if your table has the potential of adding up more rows on the client growth for example a URL shorter table.
I want to write my own database engine. Only update the index for crying out loud, if and only iff you do a select query. Or you can have a sub index as a temporary thing then you merge after a while
I see where your going but I don’t know if that will work. Your select will be slower and more importantly it will have to acquire row exclusive locks on the parts of the index that needs to get updated. This will slow down things even more because of waiting events.
@@hnasr Hmm, I hear you. Ill figure it out one day.
Great video as always! 👍 I’ve got a question: 🤨 I have a set of words 🤷♂️. (behave today finger ski upon boy assault summer exhaust beauty stereo over). What should I do with this? 🤷♂️
16:03 YAGNI priniple and KISS... implement the advanced stuff only when you know you are going to need it....
Yes! I talked about YAGNI here ruclips.net/video/zHSbMe15c2Q/видео.html
Thanks for the video. The first statement is not totally right, inserts with lot of constraints and foreign keys with hundred or millions can be also slow.
RUclips, just give me the possibility to give more than one like !!! This guy deserves tons of likes :D
Hussein, thanks for your videos!
Thanks for the breakdown! Could you help me with something unrelated: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
Thanks very much for this video.
Hey Nasser,
Which one we should prefer ORM OR Non-ORM(Query) can you light on performances and complexities!
Thank you🙏
Spoke about that before, I prefer raw SQL where I have full control, ORM can introduce extra layer of complexity and invisibility to what really is happening..
@@hnasr Okay Thanks!
Great video, also, not related to the video but still wanted to ask it.
Do you have some sort of list with recommended books?
Hey Hussein. I have a question. If we have a clustered index for email column or any other non auto incremented column in MySQL then the write operation will need to find the correct place to insert the row. Is it correct? Is so then I think that's another reason to use auto incremented primary keys for the tables.
That is correct, the moment you cluster on a "random" field like email or UUID, you risk slow down inserts because the db need to find where to insert this new row so it fits the clustered page. You also risk something called page splits which drastically decrease performance.
unclustered tables don't have this problem, everything goes in the end unsorted.
I want to ask following this question, does it mean to create PK based on some information from the record would help in performance of querying
For example Inserting a new subscription record in database we can create pk something like this SUB_02022024_HASH does this will help in querying at the cost time the server will take to create this PK?
what about mongoDB sharding ? They provide sharding default in mongoDB atlas ?
Hussein bro please make a video on ,everything about live video streaming app. how can we implement it and what would be the best approach and which data base would be the best. for both live video streaming and normal video streaming like twitch and netflix . please explain about the complete backend architecture with example thank you. i am pretty sure it would help a lot of people. theres not much insightful video out there about this topic on youtube
This should help
ruclips.net/video/1-KmLc0c2sk/видео.html
ruclips.net/video/px0i9ihcjuM/видео.html
@@hnasr omg how did i not see it 😳 tysmmm ☺☺ absolutely love your content
@Hussein at 00:58, what do you mean by a VANILLA INSERT?
Inserting a single row where all values are in the statement is simple doesn’t have performance impact (vanilla insert)
However you can do an insert that queries the table (insert into select) that has huge impact on the size of the table
@@hnasr got it...it means a simple insert at end of table
Can you make a video on Rocks DB?
Good idea
If possible, HyPer as well in future :)
Amazing video :)
hussein , is it possible to update indexes asynchronously or it is set by default synchronous and cant be changed ?
i understand the drawbacks of this , but is this even possible ?
Technically it’s possible of course but I am not sure if any database engine exposes such feature. Its even more efficient because you can batch those index updated and do them at once. As long as the engineer understands the consequences
its a complex thing to implement though
is this the same lsm tree as the one on Wikipedia? that one was invented in 1991 an published in a paper in 1996. seems like it can't be a google invention then. love your videos.
Correct the original paper is 1996, but google pushed the tech with Leveldb and made it practical
@@hnasr that explains why you say google, a database in use is of course practical. thanks for the reply.
Plz make short videos of about 5 to 8 minutes
Why should he? What are the benefits for him and his viewers? Give him context, otherwise it only seems like a selfish wish.
Its amaizing
Hey Hussein I have a query. Say I'm building a app. There are many topics and those topics contain different content . The user subscribe to a topic now I only want to fetch the stuff that user doesn't have means newly added content how can that be done? The way I think it is to compare array of content ids(on user and inside topic table) by length for each topic and then fetch the remaining content but this sounds very bad due to multiple query per every user. Im using mongoDB which does support look ups to different tables should that be needed
I would suggest using a queue or a pub/sub system such as RabbitMQ or Kafka. This feature is there by default
@@hnasr so basically I'm new to this stuff so I was thinking this whole method itself is slow but as u say I need to use this only right I tried for days but I couldn't find a better solution, even tried aggregation pipelines but all require me to do multiple queries and finding diffrence of arrays
@@hnasr I have seen ur videos on all those systems but as u say I wanted to keep it as simple as I can so I thought to myself the method to query itself is soo bad I must try redesigning but I can't get a better idea on model I thought of doing some courses on data modelling 😂
Also one more follow up question the apps supposed to inform users about new contents as notification websockets would be expensive for my app(I think) so I thought about short polling every hr(set of users come and ask in 1hr and then another set comes) or something by doing some distribution of users with latitude and time zone on client (no idea how to do it) just a thought yet
Thanks for ur replies and time ❤️
@@tikz.-3738 queuing system (kafka, rabbitmq) or periodic polling by clients are possible. I guess if you already have an existing app and want to add near time functionality then adding polling is easier. but this will mean you have periodic requests for each online user. you have to weigh the pros and cons of the two approaches.
سلام عليكم حسين اخوك أنس من المغرب أنا كمان مبرمج و أتمنى تكون سعيد تشوف كومنت بالعربي حياك الله
حبيبي تاج راسي اهل المغرب 🇲🇦 تحياتي وشكرا جزيلا