How indexes work in Distributed Databases, their trade-offs, and challenges

Поделиться
HTML-код
  • Опубликовано: 10 янв 2025

Комментарии • 45

  • @ozmenta9444
    @ozmenta9444 10 месяцев назад +14

    Making sure in depth and quality content reaches everyone is what separates you from the rest, who are making money just by dwelling on the surface. The word "thanks" alone can't show the gratitude of many, including me, who gets benefited a lot. I hope this continues forever!!

  • @prateekraj1084
    @prateekraj1084 10 месяцев назад +4

    Instead of reading multiple blogs, going through your vlog saves time and brings interest back to the topic.

  • @shubhamkumar6383
    @shubhamkumar6383 10 месяцев назад +10

    Hi Arpit
    Big FAN!!
    From your System design playlist where you explained about the database that was exactly asked me during the interview @ INDIA MART for Technical Lead Position seems like the interviewer and i studied from the same place😅
    and from the microservices playlist many challenges were thrown in the Director of Engineering Round i was able to clear both the rounds because of your videos
    Thanks a ton !!!

    • @AsliEngineering
      @AsliEngineering  10 месяцев назад +2

      this is such great news 🔥 Many many congratulations Shubham 🙌

  • @prashantn03
    @prashantn03 10 месяцев назад +7

    How creating GSIs will solve the 2 major problems 1. Shard is slow 2. Shard is dead

  • @ishantsagar1759
    @ishantsagar1759 10 месяцев назад +1

    Very well explained Arpit. Before watching this video, I was literally confused as to why Partion Key is always required to create a LSI. I understood the complete picture of it now 👌

  • @rohitreddy6794
    @rohitreddy6794 10 месяцев назад +4

    Thanks

  • @swati12091993
    @swati12091993 6 месяцев назад

    Thanks Arpit, for making such videos. After watching couple of videos on internals of a database, including yours, I have started enjoying learning about how things work in the background. Thanks for your effort!

  • @PranitKothari
    @PranitKothari 10 месяцев назад +2

    Amazing. Nice detailed explanation!

  • @techwithgd
    @techwithgd 10 месяцев назад

    Thanks for this video, we too are planning to work on Sharding/Partition in few months and would love to take this project.

  • @sushmitagoswami2033
    @sushmitagoswami2033 Месяц назад

    How about this situation - when a order table in an E-Commerce database is partitioned on order date on a quarterly basis and search query needs to return all the order of last 10 years. Then the db proxy would have to fan out the request to multiple shards. Won't it make it more slow compared to a non-shared database?

  • @abhaykatiyar3539
    @abhaykatiyar3539 10 месяцев назад +1

    Sharding can be done in relational or non-relational databases but I think non-relational db are more preffered as they have less overhead for example performing a join operation on sharded db in kind of a nightmare, but since nosql is imperative you specify the join logic in the application code itself and handle it there.
    In a nutshell SQL has feature for join but it is hard to make sense when db is sharded , but nql has no such concept so sharding make sense there much ..

    • @sankuM
      @sankuM 10 месяцев назад

      what is meant by 'imperative' here? do NoSQL handle joins very differently?

  • @harshitgangwar4500
    @harshitgangwar4500 10 месяцев назад

    Very well explained❤Learned something new today :)
    Gonna dig in a little deeper in this.

  • @tesla1772
    @tesla1772 10 месяцев назад +4

    In first case where we store blog_id(primary_key) in GIS, we will get the list for blog_ids when we try to get for a particular category. Then how will we get to know that in which db shard this blog id resides ? as the shard is based on author id.

    • @Raja_rngnj
      @Raja_rngnj 10 месяцев назад

      Hi Arpit, same doubt here, Could you please help us with this one.

    • @chinmaykhamkar7372
      @chinmaykhamkar7372 10 месяцев назад

      +1

    • @makarandpundlik1083
      @makarandpundlik1083 10 месяцев назад

      I think there is a confusion between author_id(which he told asa paritiion key) and blog_id (which we are assuming as a partition key).

    • @kelvingandhi4124
      @kelvingandhi4124 10 месяцев назад

      +1 In that case, again there will be data collection from all DB shards and combining results as blog_ids are spread across multiple shards ! Don't see any difference from actually submitted query... 🤔

  • @PrateekSaini
    @PrateekSaini 10 месяцев назад +1

    With Naive implementation, the DB routing layer was firing queries to both the shard and merging the results (scatter gather). how does GSI change that? even now the data still resides on data node. Routine layer will still have to fire queries to both the nodes. How does it solve anything?

    • @karanchatwani5180
      @karanchatwani5180 9 месяцев назад

      The first approach was querying the main shard with the category key which was not indexed, hence more latency.
      The second approach was querying the main shard with the primary key (user id) which is always indexed as it is a primary key, hence less latency.

  • @mohammedsafiahmed1639
    @mohammedsafiahmed1639 10 месяцев назад

    is an LSI a separate object that the main data itself? Cant we sort the main data itself by author key and the secondary attribute? Meaning inside of each node, the data would be sorted by athor then the secondary attribute.

  • @pragyanvarshney17
    @pragyanvarshney17 4 месяца назад

    Can we shard data on multiple parameters? I am assuming it would work like, in this case, input a user id and categiory(optional) and output the nodes in which data can be found. I don't sharding the data in this way is correct because this might introduce latency in data retrival. Also if you could create a video about types of sharding like application level sharding, database proxy sharding etc. That would be great!

  • @shwetashetye8254
    @shwetashetye8254 9 месяцев назад

    Absolutely awesome content!

  • @shouryagupta6969
    @shouryagupta6969 10 месяцев назад

    I'm just curious here, what if in the global secondary index instead of row_id (or primary key), we are able to store the page_no (actual hard storage page no)? This will fasten up reads that include GSIs a bit as it essentially skips the step of querying into the data shard and can directly access data using the page number. I understand that the performance difference might not be huge but in some niche over optimized scenarios this might come handy. The downside I believe would be that index creation will take some more time, but imo that can be written off.

  • @riteeksrivastava6157
    @riteeksrivastava6157 6 месяцев назад

    Hi Arpit, thanks for explaining the concept. I have one question regarding global secondary index, what if the secondary attribute cardinality is very high like `created_at` kind of field? Will this sharding the index based on the value scale? I also need to read more about it, but would like to know your opinion.

  • @vivek2319
    @vivek2319 10 месяцев назад

    What I feel about your RUclips Channel is, even if someone cannot afford your courses and still watches all the videos( which are FREE, btw! ) , they are more likely to ace the interviews.

    • @AsliEngineering
      @AsliEngineering  10 месяцев назад +1

      yes. and also ace their career.
      it is just that I go slightly more practical and in-depth than this in my courses helping people build the right intuition.

  • @ShreeharshaV
    @ShreeharshaV 3 месяца назад

    Thanks for great video. I have followup question
    1. Data is sharded by author_Id of the respective blog. So if I know the author_Id, I can efficiently find out which which shard has the blog info that we are interested in.
    2. Post that you created global secondary index on blog tag (say mySql, GoLang etc). You mentioned, along with secondary index you will also store blogId. Once you get the shards corresponding to blog tag, you will get all the respective blogIds.
    Now my question is how will you efficiently find out blogs for given blogId as you partitioned the actual data using authorId? Am i missing something here?
    Thanks

    • @AmanSingh-wv7no
      @AmanSingh-wv7no 3 месяца назад +1

      The GSI also store author ID and blog ID. Based on this we can fetch all rows

  • @tarunstv796
    @tarunstv796 10 месяцев назад

    Hey Arpit, Great content!
    Is there a video on distributed sequence generator?

  • @harshchiki7796
    @harshchiki7796 7 месяцев назад

    Which app do you use to write in an present in this (and other) videos? (in iPad)
    Thanks for the great content btw!!

  • @aniruddhadeshmukh9445
    @aniruddhadeshmukh9445 7 месяцев назад

    fantastic video

  • @ShaikhZahid349
    @ShaikhZahid349 10 месяцев назад

    Start kaun sa video se karu system design?????

  • @zdevpro
    @zdevpro Месяц назад

    Hy, can we get you Ipad notes badlly needed

  • @VerywellPeople-bs7ol
    @VerywellPeople-bs7ol 10 месяцев назад

    Good video ❤

  • @aqilaghamirzayev8189
    @aqilaghamirzayev8189 10 месяцев назад

    Thanks for good explanation.
    But is it OK using sql for saving blog data? Isn't ok nosql.
    Which spesific database would you recommend to choose saving blog data?

    • @AsliEngineering
      @AsliEngineering  10 месяцев назад +3

      SQL works like a charm. No need to unnecessarily go for NoSQL solutions unless your data becomes massive.

  • @raj_kundalia
    @raj_kundalia 8 месяцев назад

    Thank you so much!

  • @piyushpathak1186
    @piyushpathak1186 10 месяцев назад

    But how the global second index solves the problem that one of the shards is slow or dead???

    • @AsliEngineering
      @AsliEngineering  10 месяцев назад +4

      It makes pagination and query efficient. If you store the complete data in GSI the. It removes the need to query the data Shards.

  • @pratikdey8062
    @pratikdey8062 6 месяцев назад

    awesome