Design Patterns for High Availability: What gets you 99.999% uptime?

Поделиться
HTML-код
  • Опубликовано: 12 июн 2024
  • In this video, we discuss the topic of availability in distributed systems.
    We categorize organizations based on their acceptable levels of availability, ranging from startups to mature companies aiming for five to six nines of availability.
    InterviewReady: interviewready.io/
    We share a real-world example of a startup facing availability challenges with its database hosted in the wrong region. The solution involves migrating the database to a more suitable location and implementing a step-by-step process to minimize downtime.
    Here are five principles for building highly available systems:
    1. Simplicity over Perfection
    2. Downtime Over Loss
    3. Lesser Moving Parts
    4. Chaos Engineering
    5. Incident Reports and Root Cause Analysis
    We also touch upon fault tolerance strategies such as redundancy, load balancing, and database replication to ensure high availability in distributed system components.
    Engineers should either leverage existing highly available systems or adopt a principled approach to building and maintaining availability in their systems.
    00:00 Who is this video for?
    00:20 The 9s of availability
    02:43 War Story at InterviewReady
    06:14 Principles for High Availability
    09:05 Design Patterns for Availability
    12:31 Conclusion
    12:49 Thank you!
    Designing Data-Intensive Applications Book: amzn.to/3SyNAOy
    You can follow me on:
    Github: github.com/InterviewReady/sys...
    Instagram: / interviewready_
    LinkedIn: / interview-ready
    Twitter: / gkcs_
    #HighAvailability #SystemDesign #SoftwareEngineering

Комментарии • 17

  • @zb2747
    @zb2747 8 месяцев назад +6

    Love your content. Your wisdom and knowledge is immense.

  • @komalhora1232
    @komalhora1232 8 месяцев назад +1

    Great video, packed with information!

  • @ashishkshirsagar2596
    @ashishkshirsagar2596 Месяц назад

    Amazing! In my opinion, this is the most resourceful video with a lot of content easily comprehended within just 13 minutes. Would love to see more of these... :)

    • @gkcs
      @gkcs  Месяц назад

      Thank you!

  • @budmonk2819
    @budmonk2819 8 месяцев назад +1

    Damn, awesome video. I realized this was the architecture in my startup workplace.

  • @MegaAnu10
    @MegaAnu10 8 месяцев назад +4

    Usually its not that there is complete outage for x mins/sec depending on availability target but rather some number of Service queries result in failures all throughout the time span. This actually requires availability to be measured wrt. to number of failed queries vs successful queries.

    • @gkcs
      @gkcs  8 месяцев назад +2

      That's a great point, thank you!

  • @RishiRajxtrim
    @RishiRajxtrim 8 месяцев назад

    Great thanks

  • @pieter5466
    @pieter5466 7 месяцев назад

    Great video, been watching your channel for years. Small suggestion: improve audio using a mic.

  • @iqgirlgamer
    @iqgirlgamer 2 месяца назад

    Government websites' downtime is approx 10 days. Eventhough the user experience is frustratingly bad, we are left with no choices. 😢 Their usage graph is exponential post the downtime even if the system upgrade decides to mess you up even more.

  • @tonysolomonik
    @tonysolomonik 8 месяцев назад +1

    I wouldn't say that if the cache is down then it's alright, unless it restarts really quickly, you might get a thundering herd of requests to the main db, which might make it run out of resources thus causing a big problem.
    Other than that, great video 😊

    • @gkcs
      @gkcs  8 месяцев назад

      Good point!

  • @mystikyogi
    @mystikyogi 8 месяцев назад

    how can i calculate no. of server replicas i need, if i have 1M users with peak hours 10 - 11 pm serving 80% traffic?

  • @abhishekandray
    @abhishekandray 7 месяцев назад

    What are the moving parts? Some example plssss

  • @artasheskhachatryan4804
    @artasheskhachatryan4804 6 месяцев назад

    Scaling the database ia the hardest problem you will face.

  • @batman28996
    @batman28996 3 месяца назад

    You missed a level, Government websites- Availability ~ 70%

  • @neelpatel122
    @neelpatel122 8 месяцев назад +2

    First