How to handle database outages?

Поделиться
HTML-код
  • Опубликовано: 25 окт 2024

Комментарии • 30

  • @watansahu2190
    @watansahu2190 2 года назад +9

    Hi Arpit this was super helpful , I am currently working on a similar DB going down scenario, and I was skimming through the google to find solutions and the moment I saw the thumbnail , I was like voila ! . And this content simply gem, and sort of gave a glimpse of the the quality of content that must been on your complete master course 🔥. Looking forward to join the course . Thanks again.

  • @mindinvesting6453
    @mindinvesting6453 Год назад +2

    This is beautiful explanation. Our database bailed out on Cyber monday and now when I asses the cause and solution we applied, I found you exactly pointed out everything. Thanks for everything.

  • @AAZinvicto
    @AAZinvicto 2 года назад +1

    Great video Arpit.
    Thanks for sharing!

  • @rajendragosavi2233
    @rajendragosavi2233 2 года назад +1

    Very Helpful. I got the clear picture of wen to use Read Replicas Vs When to go for sharding. Thanks

  • @BhupendraYadav-li4ts
    @BhupendraYadav-li4ts 2 года назад +1

    Your explanations are superb. Thanks a ton sir, god bless you. Please keep such content coming

  • @anirbanbhowmick6270
    @anirbanbhowmick6270 2 года назад +2

    Amazing. More youtube videos on tech topics please!

  • @divyanshu777soni
    @divyanshu777soni 2 года назад

    This is just brilliant. Thanks a ton Arpit!

  • @hc90919
    @hc90919 Год назад +1

    Hello Arpit,
    This is a really helpful video.
    In one of my interviews recently, I was asked 'How would you debug, if one of the applications that you're working on is going down every 5 minutes'.
    I have mentioned, I would check the dashboard, and logs and try to identify what is the bottleneck (is it APIs or DB query slowing or DB Connections, CPU/ RAM / Memory usage of the DB servers / Application servers)
    How would you go about answering such questions? In reality, can you please shed some light on what steps would you take in terms of research and identifying the root cause?
    Thank you.

  • @sujayprabhu502
    @sujayprabhu502 Месяц назад

    Awesome 👏

  • @Ofureee
    @Ofureee Год назад

    Awesome!

  • @VenkataVineelYalamarthi
    @VenkataVineelYalamarthi Год назад

    When CPU spike happens, it could be because your buffer pool memory (or Wired Tiger cache) is just not enough and threads are busy evicting the buffer pool/cache. So its worth increasing that cache size and checking as well.

  • @manmohanmundhraa3087
    @manmohanmundhraa3087 Год назад

    cache can be used between api and db. so that there will be less load on database.
    also when master db down then one of replica db can used as master for time being with some limitation.

  • @mujtabajafri2803
    @mujtabajafri2803 2 года назад +1

    Why is database version update a big task ? My thoughts might be short sighted but, to me it means taking a database dump from the old version and replicating it on the new version. Also, possibly some changes in the application code for the new version.

    • @AsliEngineering
      @AsliEngineering  2 года назад +8

      Minor version upgrades of database are relatively simpler but major version upgrades are are a big pain, typically a multi-month project. Some things to take care of during version upgrades are
      - first of all understanding what is not backward compatible by going through the release docs
      - identifying the pieces in the code where the changes need to be made
      - thorough testing the changes on a separate instance with the old data on the new DB version
      - testing becomes a pain if unit tests and integration tests are not written
      - once we know the changes are okay, we need to setup a parallel replication from current DB to a DB with new version so that new database catches up
      - stop the write on the old database (take the downtime)
      - wait for the replication lag to become 0
      - run a basic sanity on the new database
      - flip the switch so that now the application code uses the new database and not the old one
      - run application level sanity again
      I might have missed a few steps here or there, but the above steps are definitely non-optional. Hope this helps.
      DB version upgrades are always a pain! Been there done that :)

  • @VenkataVineelYalamarthi
    @VenkataVineelYalamarthi Год назад

    @Arpit connections could be maxed out just because people are forgetting to close the connections. Not necessary that their queries have been running for a long time.

  • @ankk98
    @ankk98 9 месяцев назад

    How db machine scaling can be a quick solution? Through replication?

  • @saiavinashduddupudi8975
    @saiavinashduddupudi8975 2 года назад

    One quick doubt.
    There is only 1 Load Balancer in the design ( 01:02).... is it a good practice when system is designed for scaling? or can we have multiple load balancers to avoid SPOF?

    • @AsliEngineering
      @AsliEngineering  2 года назад +1

      LB in itself is a cluster of machine. So it is actually not a SPoF.

    • @saiavinashduddupudi8975
      @saiavinashduddupudi8975 2 года назад

      @@AsliEngineering Thank you so much for the clarification.
      At 28:48 , as you said, we divide to multiple shards for write/update heavy operations. But do we also shard the multiple read replicas based on the traffic?
      Can we have a master-slave architecture for each shard?

  • @AshutoshKumar-ue3dr
    @AshutoshKumar-ue3dr Год назад

    Hi Arpit, What are your experiences with AWS Aurora Serverless? Do you think it can scale in a unlimited way?

    • @AsliEngineering
      @AsliEngineering  Год назад

      There are visible hiccups while it scales.

    • @AshutoshKumar-ue3dr
      @AshutoshKumar-ue3dr Год назад

      @@AsliEngineering Our app is being premiered soon on Shark tank India, do you think it's a good choice to go for the same to avoid any downtime?

    • @AsliEngineering
      @AsliEngineering  Год назад +1

      @@AshutoshKumar-ue3dr Whoaaa congratulations!
      I would really recommend talking to AWS support. They would warm up and pre-scale your database in advance. This would help you handle the surge.

  • @kerrygrover
    @kerrygrover 2 года назад

    Great video!! Is it possible for a system to be able to create more TCP connections while at 100% CPU usage? Will the system be able to allocate CPU to create more TCP connections?

    • @AsliEngineering
      @AsliEngineering  2 года назад

      It will be but a lot slower. A 100% CPU does not means a stall. But it will just take a long time to setup as the process will get far fewer cycles to execute.

  • @stevenspellberg
    @stevenspellberg 2 года назад

    I believe when SLAs are online in cloud environment reboot is always the last option

  • @_sudipidus_
    @_sudipidus_ Год назад

    Interesting
    But these were along the lines of how to avoid db outages
    I expected how to handle when db server actually crashed, how to recover data using logs, rolling back faulty transactions etc
    good pointers nevertheless

  • @Speak12truth
    @Speak12truth 2 года назад

    Kids suggesting to kill the query,rebooting the server.
    An experienced database administrator/engineer be like 🤣

  • @Speak12truth
    @Speak12truth 2 года назад

    So an index is enough, up to date statistics not required?🥱