How

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024

Комментарии • 31

  • @DHARSHANSASTRYBN
    @DHARSHANSASTRYBN Год назад +7

    Great video Arpit!!
    One more step is missing after cutover, before we prune old records from source database, either we need to stop Kafka consumer listening to delete CDC events or make an entry in consumer to filter/ignore these records otherwise we may end up applying these delete on target database as well :)

    • @AsliEngineering
      @AsliEngineering  Год назад +1

      Hahhahaha 😂😂😂 true. We should definitely switchoff the CDC once everything caught up 😀😀
      It once happened with me and I was thinking where did the data go???? 😂😂😂😂😂

    • @DHARSHANSASTRYBN
      @DHARSHANSASTRYBN Год назад

      @@AsliEngineering 😅

  • @adianimesh
    @adianimesh Год назад +4

    Another approach could be that instead of moving the shop which was making the shard hot we can move the other shops' rows to rebalance the load !

  • @mohammedabbas6404
    @mohammedabbas6404 Год назад +1

    Very well explained👏🏼. Thank you Arpit.

  • @akshay_madeit
    @akshay_madeit Год назад +1

    Great notes and content. Awesome bro. Keep rocking

  • @PriyankaYadav-tb3pw
    @PriyankaYadav-tb3pw Год назад +1

    Great video Aprit. Thanks 🙏🏻

  • @rishabhgoel1877
    @rishabhgoel1877 Год назад

    good stuff but shouldn't this be in phased manner like first stopping read from DB1(write still going on) and then stop writes or having some sort of phased treatment/dialup?

  • @sagarmehta3515
    @sagarmehta3515 Год назад

    Very well explained

  • @koteshwarraomaripudi1080
    @koteshwarraomaripudi1080 Год назад

    I absolutely love the content of this channel. Thank you, Arpit.
    Do we have any channel out there that posts similar content talking about engineering challenges and how they are solved? If we have any please comment them. Thanks in advance.

  • @malepatiyogeshkumar3855
    @malepatiyogeshkumar3855 Год назад

    Hello @arpit, Thanks for the great video won't they have a replication/ data warehouse to store the data incase of some failure if yes cant we do bulk load from the place instead of reading from the current DB?

  • @rjarora
    @rjarora Год назад

    I've one question. Won't they already have replicas? Can't they copy from replicas, why put load on master?

  • @sagarnikam8001
    @sagarnikam8001 Год назад +1

    Hi arpit, great video and really appreciate your efforts...
    have a few doubts it will be great if you or anyone here can clearify those.
    1. If we move a particular shop(which was making the shard hot) from one shard to another, how will it solve the issue? I mean wouldn't it just make our other shard(where we copied data) hot?
    2. If we batch copy data from hot shard wouldn't that increase the load on the shard and might bring it down? I don't exactly know how much load the batch copying will bring in...
    3. What would be ideal threshold, of load on shard(e.g. 80%, 70% resource utilisation), after which we should think about moving data? or it is when we see some large difference between resource utilisation among various shard present

    • @rohan_devarc
      @rohan_devarc Год назад +1

      A hot partition problem is a subjective problem and is often related to Database Internals.
      - It involves dealing with any kind of mandatory operation that couldn't wait any longer say "backup, compaction/VACCUM, Write Repair, Read Repair ", and these operations often add additional IO networking to the already occupied CPU core. And following that DB goes into an imbalanced state.
      - Shiting of shards involves the redistribution of the data volume as well as offloading of mentioned operation to other cores/NUMA sockets.
      - So rectifying the hot partition problem is more about rectifying the imbalance.
      - Don't rely on a database unless its shard balance mechanism is identical to the application need
      From the perspective of DB, only replicas can suffer from hot partition problems. However, this is not true for many applications and hence Shopify has an application-space driver, and "Batch copying" is a lot more economical especially when you have more control over the driver that triggers the same batch operation. This is in contrast with DB tooling which can only offer some amount of abstract control.
      "There is no threshold until you start having one" i.e. application-space driver for shard migration would work on application-specific load pattern.

    • @LeoLeo-nx5gi
      @LeoLeo-nx5gi Год назад

      Firstly thanks Arpit bhaiya for explaining in such ease!!
      Now even I wanted to ask about the second point which Sagar Nikam mentioned - won't it put more load on our initial shard when we copy data from it to another one?

    • @dharins1636
      @dharins1636 Год назад +1

      1. the non-hot shop are being moved to reduce the load on the pod x db which has higher load due to shop m
      2. This is via bin-log so it is mysql all transactions are logged in a "file" which can be used to replicate the data to different databases or messaging system (see. debezium)
      3. That really depends on the use cases and the impact on the system due to the load, which is very much to 'trying' out different configurations (and as mentioned in the video, its driven by analytics team)

    • @LeoLeo-nx5gi
      @LeoLeo-nx5gi Год назад

      @@dharins1636 ohh thanks, didn't knew about Debezium

  • @SubhamDas-dw3kk
    @SubhamDas-dw3kk Год назад

    Wondering what would happen during the Black Friday sale, multiple stores (or shops) will be hot. Additionally, a very small downtime (even half a minute) would be of great business loss

  • @swagatochatterjee7104
    @swagatochatterjee7104 Год назад +2

    What if the lag never dies? That is the there are always new writes to the old DB.

    • @AsliEngineering
      @AsliEngineering  Год назад

      Lag has to ear off it is a function of CPU utilisation.
      There is a possibility that lag is increasing because.of an underlying hardware failure and in that case you create a new replica with a recent snapshot requiring you a minimal time to catch up.

    • @swagatochatterjee7104
      @swagatochatterjee7104 Год назад

      @@AsliEngineering I understand the data pump would transfer the data from the new db to the old one. However, the NGINX is still sending the requests for shop 2 to the old db. So there seems to be no guarantee that there wouldn't be any writes to the old db. Or is there something which I am completely missing?

    • @AsliEngineering
      @AsliEngineering  Год назад +2

      @Swagato Chatterjee writes are cutoff from old db for sometime through a knob. Covered in video.
      You take a small downtime (not really a downtime but a glitch with retries) and stop accepting writes for that shop altogether so that other DB could catchup.

    • @swagatochatterjee7104
      @swagatochatterjee7104 Год назад

      @@AsliEngineering thanks it makes sense now.

  • @ksansudeen
    @ksansudeen Год назад

    Hello @arpit, I see you are talking only about the data but how about indexes ? so shopify doesn't use indexes on those table ? i don't think that could be a case.

    • @AsliEngineering
      @AsliEngineering  Год назад +2

      Indexes are implicitly managed by the database.

  • @tesla1772
    @tesla1772 Год назад

    What if some row gets updated after we copied that row

    • @AbhisarMohapatra
      @AbhisarMohapatra Год назад +5

      The big log would take care of that

    • @dharins1636
      @dharins1636 Год назад

      bin-log is serial hence it tracks all the changes with respect to the changes done in time :)

    • @tesla1772
      @tesla1772 Год назад

      @@dharins1636 okay so we have to replicate that entire query on new db. Is that right?

    • @shantanutripathi
      @shantanutripathi Год назад

      @@tesla1772 Yes, we do copy that binlog up until the current change to new replica + new request would also be queued up AFTER binlog to be applied on new replica. Because if we send new database requests(insert, delete) before applying binlog, data integrity would be lost.