Avoid premature Database Sharding

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 62

  • @hnasr
    @hnasr  3 года назад +4

    Learn the fundamentals of database systems to understand and build performant backend apps
    Check out my udemy Introduction to Database Engineering
    husseinnasser.com/courses

  • @eslama-elwafa2
    @eslama-elwafa2 3 года назад +22

    summary:
    "only solve problems that you have, don't solve the problem you don't have"

  • @sathyajithps013
    @sathyajithps013 3 года назад +10

    Thanks for doing this video Hussein. As a new backend dev, I've always wondered about this and no one has answered it in detail as you have.

  • @hnasr
    @hnasr  3 года назад +20

    🍑 Hussein 🍑 Hussein 🍑 Hussein 🍑 Hussein

    • @hardikp5192
      @hardikp5192 3 года назад +1

      My answer to "what's going on guys" is, It's my first day on the job, I got an Internship in a startup company, BIG THANKS to you sir, because of you I became passionate about Backend Engineering, so, I'm doing great and I'm grateful. Bless you and your family

  • @hnasr
    @hnasr  3 года назад

    If you enjoyed this video consider checking out my Introduction to Database Engineering course. Its a bestseller on udemy with over 14 hours worth of content and students love it !
    husseinnasser.com/courses

  • @hazemabdelalim5189
    @hazemabdelalim5189 3 года назад +2

    Hello Hussein , I am working with Couchbase Database which can update the indexes asynchronously , so even if you have indexes insertion will be still O(1) , however it will take time for the index tree to be re-built but it will be re-balanced eventually .

    • @hnasr
      @hnasr  3 года назад

      Interesting Hazem! Thanks for sharing

  • @AleksandarT10
    @AleksandarT10 3 года назад +2

    You got a good point Hussein. Partitioning/Sharding does not make sense for most of the use cases. Where it makes sense as you said is when you have Billions of Rows and Terabytes of data (which only Big Enterprises do). I work for a Big ECommerce Company Headquartered in Silicon Valley and before migrating to Cloud, we were using Oracle DB with Sharding. We were using several physical servers where we had the UserId going through a "Hashing" Function which gave us the number of the shard. Based on our query "patterns" analysis, we decided that we can go to "DynamoDB" (NoSQL) and we moved the data there in a single table. DynamoDB Internally uses partitioning/sharding itself, but this is not visible to the end user, which means that the complexity of that gets offloaded to AWS.

    • @hnasr
      @hnasr  3 года назад +1

      Thanks for sharing ! Great use case

  • @LearnToCodeAcademy
    @LearnToCodeAcademy 3 года назад

    Just purchased the course... I love your contents honestly... They are beautiful

    • @hnasr
      @hnasr  3 года назад +1

      Thank you so much ❤️

  • @isaacfrost9798
    @isaacfrost9798 3 года назад

    I think using hash Partitioning is a good idea if your table has the potential of adding up more rows on the client growth for example a URL shorter table.

  • @colinmaharaj
    @colinmaharaj 3 года назад

    I want to write my own database engine. Only update the index for crying out loud, if and only iff you do a select query. Or you can have a sub index as a temporary thing then you merge after a while

    • @hnasr
      @hnasr  3 года назад +1

      I see where your going but I don’t know if that will work. Your select will be slower and more importantly it will have to acquire row exclusive locks on the parts of the index that needs to get updated. This will slow down things even more because of waiting events.

    • @colinmaharaj
      @colinmaharaj 3 года назад

      @@hnasr Hmm, I hear you. Ill figure it out one day.

  • @TrippMotil
    @TrippMotil 2 месяца назад

    Great video as always! 👍 I’ve got a question: 🤨 I have a set of words 🤷‍♂️. (behave today finger ski upon boy assault summer exhaust beauty stereo over). What should I do with this? 🤷‍♂️

  • @Flankymanga
    @Flankymanga 3 года назад

    16:03 YAGNI priniple and KISS... implement the advanced stuff only when you know you are going to need it....

    • @hnasr
      @hnasr  3 года назад +1

      Yes! I talked about YAGNI here ruclips.net/video/zHSbMe15c2Q/видео.html

  • @javisartdesign
    @javisartdesign 3 года назад

    Thanks for the video. The first statement is not totally right, inserts with lot of constraints and foreign keys with hundred or millions can be also slow.

  • @FISS007
    @FISS007 3 года назад +4

    RUclips, just give me the possibility to give more than one like !!! This guy deserves tons of likes :D

  • @mihailnesterov
    @mihailnesterov 3 года назад

    Hussein, thanks for your videos!

  • @AranTodd
    @AranTodd 15 дней назад

    Thanks for the breakdown! Could you help me with something unrelated: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?

  • @romantsyupryk3009
    @romantsyupryk3009 3 года назад +1

    Thanks very much for this video.

  • @cse2335
    @cse2335 3 года назад +3

    Hey Nasser,
    Which one we should prefer ORM OR Non-ORM(Query) can you light on performances and complexities!
    Thank you🙏

    • @hnasr
      @hnasr  3 года назад +8

      Spoke about that before, I prefer raw SQL where I have full control, ORM can introduce extra layer of complexity and invisibility to what really is happening..

    • @cse2335
      @cse2335 3 года назад

      @@hnasr Okay Thanks!

  • @mudrRock
    @mudrRock 3 года назад

    Great video, also, not related to the video but still wanted to ask it.
    Do you have some sort of list with recommended books?

  • @nishantkamboj5973
    @nishantkamboj5973 3 года назад +1

    Hey Hussein. I have a question. If we have a clustered index for email column or any other non auto incremented column in MySQL then the write operation will need to find the correct place to insert the row. Is it correct? Is so then I think that's another reason to use auto incremented primary keys for the tables.

    • @hnasr
      @hnasr  3 года назад +1

      That is correct, the moment you cluster on a "random" field like email or UUID, you risk slow down inserts because the db need to find where to insert this new row so it fits the clustered page. You also risk something called page splits which drastically decrease performance.
      unclustered tables don't have this problem, everything goes in the end unsorted.

    • @javaprogramming7235
      @javaprogramming7235 9 месяцев назад

      I want to ask following this question, does it mean to create PK based on some information from the record would help in performance of querying
      For example Inserting a new subscription record in database we can create pk something like this SUB_02022024_HASH does this will help in querying at the cost time the server will take to create this PK?

  • @GaurangDhorda
    @GaurangDhorda 3 года назад

    what about mongoDB sharding ? They provide sharding default in mongoDB atlas ?

  • @whotookmyign
    @whotookmyign 3 года назад

    Hussein bro please make a video on ,everything about live video streaming app. how can we implement it and what would be the best approach and which data base would be the best. for both live video streaming and normal video streaming like twitch and netflix . please explain about the complete backend architecture with example thank you. i am pretty sure it would help a lot of people. theres not much insightful video out there about this topic on youtube

    • @hnasr
      @hnasr  3 года назад +3

      This should help
      ruclips.net/video/1-KmLc0c2sk/видео.html
      ruclips.net/video/px0i9ihcjuM/видео.html

    • @whotookmyign
      @whotookmyign 3 года назад +1

      @@hnasr omg how did i not see it 😳 tysmmm ☺☺ absolutely love your content

  • @saiavinashduddupudi8975
    @saiavinashduddupudi8975 3 года назад

    @Hussein at 00:58, what do you mean by a VANILLA INSERT?

    • @hnasr
      @hnasr  3 года назад +1

      Inserting a single row where all values are in the statement is simple doesn’t have performance impact (vanilla insert)
      However you can do an insert that queries the table (insert into select) that has huge impact on the size of the table

    • @saiavinashduddupudi8975
      @saiavinashduddupudi8975 3 года назад

      @@hnasr got it...it means a simple insert at end of table

  • @dibyaranjanmishra4272
    @dibyaranjanmishra4272 3 года назад +2

    Can you make a video on Rocks DB?

    • @hnasr
      @hnasr  3 года назад

      Good idea

    • @bisakhmondal8371
      @bisakhmondal8371 3 года назад

      If possible, HyPer as well in future :)

  • @ThinkSoftware
    @ThinkSoftware 3 года назад

    Amazing video :)

  • @sherbineyb7166
    @sherbineyb7166 3 года назад

    hussein , is it possible to update indexes asynchronously or it is set by default synchronous and cant be changed ?
    i understand the drawbacks of this , but is this even possible ?

    • @hnasr
      @hnasr  3 года назад +1

      Technically it’s possible of course but I am not sure if any database engine exposes such feature. Its even more efficient because you can batch those index updated and do them at once. As long as the engineer understands the consequences
      its a complex thing to implement though

  • @teunohooijer6788
    @teunohooijer6788 3 года назад

    is this the same lsm tree as the one on Wikipedia? that one was invented in 1991 an published in a paper in 1996. seems like it can't be a google invention then. love your videos.

    • @hnasr
      @hnasr  3 года назад

      Correct the original paper is 1996, but google pushed the tech with Leveldb and made it practical

    • @teunohooijer6788
      @teunohooijer6788 3 года назад

      @@hnasr that explains why you say google, a database in use is of course practical. thanks for the reply.

  • @0xfaizan
    @0xfaizan 3 года назад

    Plz make short videos of about 5 to 8 minutes

    • @mugiwaranoadi5932
      @mugiwaranoadi5932 3 года назад

      Why should he? What are the benefits for him and his viewers? Give him context, otherwise it only seems like a selfish wish.

  • @Nexjsdeveloper
    @Nexjsdeveloper 3 года назад

    Its amaizing

  • @tikz.-3738
    @tikz.-3738 3 года назад

    Hey Hussein I have a query. Say I'm building a app. There are many topics and those topics contain different content . The user subscribe to a topic now I only want to fetch the stuff that user doesn't have means newly added content how can that be done? The way I think it is to compare array of content ids(on user and inside topic table) by length for each topic and then fetch the remaining content but this sounds very bad due to multiple query per every user. Im using mongoDB which does support look ups to different tables should that be needed

    • @hnasr
      @hnasr  3 года назад

      I would suggest using a queue or a pub/sub system such as RabbitMQ or Kafka. This feature is there by default

    • @tikz.-3738
      @tikz.-3738 3 года назад

      @@hnasr so basically I'm new to this stuff so I was thinking this whole method itself is slow but as u say I need to use this only right I tried for days but I couldn't find a better solution, even tried aggregation pipelines but all require me to do multiple queries and finding diffrence of arrays

    • @tikz.-3738
      @tikz.-3738 3 года назад

      @@hnasr I have seen ur videos on all those systems but as u say I wanted to keep it as simple as I can so I thought to myself the method to query itself is soo bad I must try redesigning but I can't get a better idea on model I thought of doing some courses on data modelling 😂

    • @tikz.-3738
      @tikz.-3738 3 года назад

      Also one more follow up question the apps supposed to inform users about new contents as notification websockets would be expensive for my app(I think) so I thought about short polling every hr(set of users come and ask in 1hr and then another set comes) or something by doing some distribution of users with latitude and time zone on client (no idea how to do it) just a thought yet
      Thanks for ur replies and time ❤️

    • @willi1978
      @willi1978 3 года назад

      @@tikz.-3738 queuing system (kafka, rabbitmq) or periodic polling by clients are possible. I guess if you already have an existing app and want to add near time functionality then adding polling is easier. but this will mean you have periodic requests for each online user. you have to weigh the pros and cons of the two approaches.

  • @AnasHmamouch
    @AnasHmamouch 3 года назад

    سلام عليكم حسين اخوك أنس من المغرب أنا كمان مبرمج و أتمنى تكون سعيد تشوف كومنت بالعربي حياك الله

    • @hnasr
      @hnasr  3 года назад +1

      حبيبي تاج راسي اهل المغرب 🇲🇦 تحياتي وشكرا جزيلا