Amazon Interview Question | System Design: Inventory Management (with FAANG Senior Engineer)

Поделиться
HTML-код
  • Опубликовано: 28 ноя 2024

Комментарии • 10

  • @koeber99
    @koeber99 2 года назад +3

    Hi, great content. However, I've got some basic questions:
    1. The traffic is high for your requirements. Why aren’t you using a load balancer or a cache in the design? I.E. many customers may just view the product. A “holding" the item in inventory would seem unnecessary. Wouldn’t a cache be helpful here?
    2. (22:11) You stated at 1500 writes per second not to use a queue but use streams (kafka) at those volumes. Can you provide a ballpark on volumes when its better to use stream vs queue?
    *Note: I think a video on “How capacity estimation should change your design?” would be helpful. Stream vs Queue 10 K writes vs 1 K. High traffic vs low traffic. I added it to the request link.
    3. Dumb question: What is the difference from task runner, worker, vs a service? When to use a task runner (lambda) vs a service? Outside of AWS, is it just a service/function?
    4. (35:00 ) Why are you using MapReduce for a worker? When to use it?
    5. (37:00) Why is a snapshot metadata DB needed? You stated because it takes a long time to create a snapshot. What is the improvement of using a metadata DB? When should you use it?
    Thanks!!!

    • @SDFC
      @SDFC  2 года назад +9

      1. Why aren’t you using a load balancer or a cache in the design?
      I actually just forget to add those all the time. It would probably be a good idea to add one of those to the design.
      2. Can you provide a ballpark on volumes when its better to use stream vs queue?
      I have never actually tried to do the math on a good cut off before, so maybe don't quote me on this lol -- I'd estimate somewhere around 10TPS, but that might even be a little high. You can definitely get away with using a queue at higher TPS, but you run the risk of dropping messages if you ever get an outage that lasts long enough.
      The issue that is the source of the cut-off is "backpressure", which you can get in a message queue, but not a stream.
      3. What is the difference from task runner, worker, vs a service? When to use a task runner (lambda) vs a service?
      A service will use dedicated machines, such as EC2 instances. Task runners and workers are basically the same thing and will typically use something like AWS lambda, but I think "workers" could imply a more computationally expensive or longer-running task, so workers could sometimes be something other than lambda functions, such as EMR clusters or occasionally dedicated worker machines (in some cases this would even be a great opportunity for EC2 "spot" instances).
      4. Why are you using MapReduce for a worker? When to use it?
      MapReduce is for when you're going to be doing a giant processing job that can involve something like a full table scan. It's for analytical queries, rather than OLTP. In the book "Designing Data-Intensive Applications", there's a good chapters on differentiating OLAP from OLTP and another full chapter on batch processing with things like MapReduce.
      Analytical queries vs OLTP can actually be a spectrum, and the book "Designing Data-Intensive Applications" captures every area of that spectrum pretty well.
      5. Why is a snapshot metadata DB needed?
      The metadata DB is just a thing for tracking the process of creating a snapshot since the snapshots will be created asynchronously.
      Whenever processing will occur asynchronously, the status of the task itself suddenly needs persisted most of the time.
      When the processing is synchronous, you're able to tie the result of a batch job back to the request just from the context of the server's request.
      When it's asynchronous, the 2nd request for retrieving the output of the processing work would need context from the first request; however, these requests might not even necessarily be handled by the same machine in a horizontally scaled service, so you have to persist that context to a database.

  • @prasidmitra6859
    @prasidmitra6859 Год назад +1

    2 questions:
    1. The inventory db seems inherently transactional with a structured consistent schema with requirements for strong consistency. It also seems we might querying on it like (give me all items which have a hold > 2) when the main stock is low and you have to actively cancel holds. Doesn't a relational db make more sense here? I might be wrong tho.
    2. Can we bypass the event sourcing DB and have the snapshot workers read directly from the kafka topic. We know that kafka events can be replayed from a specific time, so it seems to serve our purpose of building a snapshot

  • @gowrialaparthi4257
    @gowrialaparthi4257 Год назад +3

    How the concurrent requests will update the inventory DB?

    • @kvsantosh
      @kvsantosh Год назад

      Locking will ensure that concurrent requests are handled properly.

  • @houseofmalik9944
    @houseofmalik9944 Год назад

    Good tutorial, there seems to be mix of technologies like cassandra, kafka etc, I'm not too familiar with these but I get the idea they are event processing services. Does anyone know if there is a ESB solution to such problem that uses a consistent framework to solve a problem like this. It might be easier to maintain if its not too many frameworks in the mix.

  • @shaileshagarwal1
    @shaileshagarwal1 Год назад

    I am not convinced as why do we have asymc update in inventory db. This has to be in sync whenever order is processed or update of count in product inventory happens. Can you clarify more on this ?

    • @meditationdanny701
      @meditationdanny701 Год назад

      I feel the whole design decision is to makes the process high available as the incoming request will be interacting with many down stream services including some 3rd party payment service plus purchase happens here so persisting this event is important as other auxiliary services will be using this event for the API details we will create a checkout session now and the payments will collected from the user and the nature of the API should be idempotent (this way will be able to achieve a consistent business logic) i hope this makes sense have a nice day

    • @baiyuli97
      @baiyuli97 Год назад

      Same the decoupling here doesn't really make sense. Using Kafka, the client now won't be able to utilize response status code to determine success.

    • @baiyuli97
      @baiyuli97 Год назад

      Also, using kafka introduces a whole other bag of worms in terms of failure cases to consider. How does the producer guarantee work? Do we guarantee exactly once or at least once delivery, and if at least once delivery, is the downstream processing idempotent or will we need a separate state store to avoid duplicates? What happens if the downstream processing succeeds, but the worker goes down before committing the offset? What happens if the DB is down and the processing cannot complete? What if the outage is extended?