I hope you all find this mock systems design interview insightful! And if you're in the business of acing your systems design interviews (who isn't? 😎), be sure to check out www.systemsexpert.io where we have more questions like this one as well as a full course on all of the fundamental systems design topics that you need to understand to tackle these types of interviews.
Clement, I have some questions. What is the fastest way to become a junior developer in 2020??? What is the best way to learn programming to build advanced projects???
The feedback is probably that he's missed a lot of corner cases (interviewer mentions at least 2 of them in the video), and I don't know why he's not using a messaging queue instead. In short, this is probably a weak interview.
Normally, the interviewers not sure about the solution themselves or they would have a template solutions which they feel that would fit every problem. So, they would not be able to comment on our design.
@@cloud15487 For Example, one that I caught, was the worker heartbeat, what if the job is successfully completed but didn't send a heartbeat due to a network error, you wouldn't want to run the build again. Another is he never mentioned storage for the build artifacts, and how to resume a build from the artifacts, or use them, if you are building a build system, you definitely need to cover some sort of distributed storage for build artifacts, if you are building 100k times a day you don't want to have to download JS, or .NET, 100k times, even if you are pulling from your own repository you want to eliminate as much bandwidth usage as possible. BUT overall this is a good interview, maybe not perfect, but that is typical for most interviews, I think it was more important to show how he interacted with the interviewer, asking questions, clarifying requirements. Most companies want to see how well you handle overall project communication so as to best deliver according to the stated requirements.
wow...simply amazing.....this is like a real world experience for the system design interviews most of the videos on youtube are just about explaining the system design about some popular applications like messenger, tinyurl, dropbox etc... all of them have one thing in common....all of them explain the design very well in only one aspect....they are all well structured, there are no hiccups, there is not much thinking to do.....they design the system without any flaws...why? because they know the design and there is no interviewer who is asking for in depth details or looking for more clarifying answers...they just explain the system in only one direction This is totally different and the one which we actually need. This is by far the best for multiple reasons 1. It shows how a real world interview looks like with all the white boarding sessions 2. the question given by the interviewer is very abstract....the interviewee may know the design in one particular aspect, but the interviewer is looking for some different aspect and asking more in depth details at every stage 3. the interviewee actually asks a series of questions at the start and also keep asking different questions time to time 4. at times the interviewer stops the interviewee because he missed explaining few details and missed details about fault tolerance and keeps asking for more in depth details about how the interviewee is approaching his design 5. its more of a collaborative brainstorming which actually happens in real world 6. it also showed how the interviewee picked some technologies like SQL, GCS, KVS etc and validated it with his explanation, getting approval of the interviewer and proceeding to the next stage at every point in the interview. Totally worth watching the whole video.....thank you Clement and kudos for creating such a valuable content hope to see more such videos
Yeah this is exactly how this goes in real interviews. When I interviewed at Amazon for engineering manager, I was asked to write an analytics system with 1 million writes per second. It was a bit more challenging than this example in terms of scale, but still similar in concept.
I used to have zero understanding of the importance of clarifying requirements and used to ask typical dimension clarifying question without knowing how that could actually help me solving the problem at hand. You really showed how you could incorporate the information you gained by asking clarifying questions through this mock interview - appreciate it
What I loved most is, it actually went on like a real interview, where candidate doesn't know all the answers, and kept building on the assumptions, kept extending the designs. Thank you for making it look realistic!
The SQL transaction would ensure that all of the SQL statements would run together or not at all (rollback), but it doesn't ensure atomicity. In this design, to prevent multiple workers from running the same job concurrently, wouldn't you need to specify the isolation level on the transaction in order to prevent a non-repeatable read? Example: worker 1 selects the oldest "queued" job. worker 2 selects the oldest "queued" job. worker 1 updates the job status to "running".
This was really good. I often paused the video to offer my 2 cents and thought out loud. Really liked the collaboration between Clement and the interviewer. Also, one subtle thing is Clement’s frightlessness in asking even the most simple questions and repeating things in his own words. This is what allowed him to really build and understanding of the problem, as the interviewer’s words may not click the first time and they are meant to be ambiguous so it fosters clarifying questions. His interviewing skills really showed off here. So much to learn!
So many questions :) 1. How do you handle jobs which require building in multiple architectures (x64, arm, etc) 1.5 How would you handle scenarios involving multiple branches? 2. Why use SQL while there are so many distributed queue alternatives (SQS, for example) which let you handle scenarios like a builder failed elegantly? 3. Why store heartbeats in the same table? Why not have the builders register with a configuration service like Zookeeper which can take care of heartbeating? 4. How are build logs stored? How often are they garbage collected? 5. With this approach the SQL table will continue to grow infinitely which will make queries more and more expensive over time. What would be the table cleanup strategy. 6. How do you ensure that all machines have received a certain build? 7. How do you ensure that a malicious system doesn't pose to be a 'seed' and try to deploy malicious binaries?
Great video Clement. Thank you for sharing an insight into how these interviews are conducted. 2 things I would have prodded more for as the interviewer are: 1) How do you deal with downtime while switching between the versions? 2) How do you manage DB changes as well as the code changes discussed here.
Most time spent on build system, not too many details covered about the deployment systems like safety of deployment, health checks, configurable deployment speeds, rollbacks, global service health \ availability checks to pause, incident management. These things are highest order bit for any large scale company as no one wants a global or regional outage.
Best way for the SQL-Statement at 27:00 would be: BEGIN TRANSACTION; SELECT * FROM jobs GROUP BY status HAVING status = "QUEUED" AND created_at = MIN(created_at) LIMIT 1; UPDATE jobs SET status = "RUNNING" where id = our_id; COMMIT;
I loved this. Thanks so much for taking the time to put this high quality video! I've a small comment that I can't get out of my head: The SQL transaction that workers use to grab jobs seem to have a race condition that might make 2 or more workers grab the same job. I don't think ACID here protects from that. Locking needs to be done somehow to prevent that, and the first option I'd explore, as it's the simplest, is to change the SELECT to SELECT FOR UPDATE, assuming the underlying SQL engine supports it which is guaranteed to lock the resulting row for a single worker until it had a chance to perform the update on the status. If SELECT FOR UPDATE is not supported, we can explore other options such as manual row-level or table-level locking. Great stuff, thanks again!
Exactly my thought. Plus I was thinking why he wouldn’t use a tool built for queues instead of making sql fit into this use case. Especially when the interviewer asked him do you think this will serve the purpose and he doubled down on it lol. Note to myself: listen carefully when such a question comes up haha
You can actually build a queue in SQL without SELECT FOR UPDATE, or any implicit locks. You do an update on the row, setting its state to RUNNING, where the id matches and the row is in the QUEUED state. If the affected rows is 0, the task has already been picked up. If it is 1, you have picked up the task. Note that autocommits need to be enabled on the UPDATE, or there is no difference from SELECT FOR UPDATE.
This was so good. Thank you @Clement for giving us an insight on what goes on in a real system design interview. Kudos to the interviewer as well. He asked brilliant questions and put up some interesting extra questions to make it difficult yet understandable. Subscribed ✅
Great interview video: I liked the most is the interaction between the interviewer and interviewee. Some thoughts on the design: 1) the SQL database can be a Queue system such as Kafka or Pub/Sub from GCP, which handles the scale very well with partitions. Given the large scale, the frequent query and updates to the SQL database will harm the performance, especially update the worker's heartbeats every second in the table. So a message broker will be the best solution. Worker status / heartbeat should be monitored and reported to a monitoring system such as Prometheus. Worker can push its health check status to Prometheus. 2) The interviewer tried to ask system design in Non-Abstract way, meaning how many servers are needed, do an analysis on bandwidth, timing, storage in a datacenter, and non-uniformed load (x1.25), then 3 datacenter deployment in a region, to derive the footprint per datacenter and global footprint. I hope the interviewee can elaborate on the Non-Abstract design more. Overall, a great system design interview. Thank you very much.
I guess problem with Kafka is that it stores the msges in topic only for some period of time, and also how would you update the record inside the topic . But there could be a sql - Kafka replica tho
I had imagined a distributed queue and a separate process to store the jobs for retaining history if needed. DB for queue will be expensive. Maybe even a DB+ Redis Cache where the server pulled off from a queue structure if access to cloud services is a constraint
100% right. I can add two more things on top of this. 1. Why building SQL queries is important for system design interview? We should deal with more abstract design, and maybe, later, it's possible to talk about SQL in depth, but they wasted so much time on it. 2. The presented SQL query will now work properly under load, this is why a queue system is by far superior, I couldn't understand why he wouldn't just use that, we can retain the data for as long as we want and slap extra metadata on top.
I think SQL is indeed a better choice than message queue (i.e., Kafka) here. MQ for stream processing might be appropriate for super high QPS workflows, but assuming a giant tech company with 40k devs each with 2 commits per day, that's only 0.9 QPS builds coming into this build system, which is well within the abilities of a SQL DB and complete overkill for MQ. Additional reasons why MQ is not an ideal option: 1) MQ tends to have low retention, you'll need to end up storing your jobs in some persistent store like SQL or NoSQL DB anyway to preserve full history 2) It's difficult to parallelize MQ unless you have multiple partitions / topics which match the number of workers -- this is because coordinating multiple workers consuming the same MQ partition is difficult. Yet the low QPS of input to this system does not warrant partitioning to me. 3) As the presenter explained, SQL DB has built-in consistency where build workers need to acquire a read/write lock to obtain the rights to build a job. MQ does not provide this out of the box, so it's up to client to coordinate worker pool. See point 2 above on why this is difficult. 4) MQ does not preserve order of jobs between partitions, and it was a requirement that jobs be roughly started in the same order they come in. I think jumping to the conclusion that we need MQ for *any* kind of queue can be a detrimental mindset in system design, and could hurt a candidate in a system design interview. It's worth understanding the tradeoffs and explaining this to the interviewer.
@@Core433 How does one enter the career path of system design? What prerequisite knowledge, certifications, skills would you think they would need? Are there particular programming languages or tools for home situations one can practice in by designing their own simple small systems using free tools?
Great to see this sort of thing. So many people trying to get in to tech are obsessed just with software engineering. There's a whole gamut of careers and skill sets in tech, and hopefully this will open people's eyes to the opportunities aside from software engineering (nothing wrong with software engineering, btw).
Your help and free content is appreciated, and I am rooting for the success your entire business. There is nothing brighter than an honest and smart effort. Keep shining.
Thank you for showing your prejudice for all to see. Btw this is a mock interview, not a real one. The Indians you speak so “highly” of are under pressure of a real interview.
Such an AMAZING video Clement. Before this video, I went through few others but this one literally beated them all. The quality of the questions asked and the way you responded gave a Google feel. Thanks ALOT for bringing such high quality content. Absolutely terrific.
Clement, I really like the way how questioned to dissect every aspect of the question, to collect as much as information which is a super motivating tool for all developers.
Clement, thanks for the video. It would be nice to have a summary at the end - what the interviewer thinks you did well/what could have been better. Or the assumption is that you did everything perfectly fine? One specific question/concern I have is - the interviewer asked you if you can have concurrency problems when querying your queue log table from 100 workers. You said 'no' and wrote a query you would use. What transaction isolation level you meant when you were creating a transaction in your query? If you use MySQL for instance, and REPEATABLE_READ (the default one), two or more workers can read the same record as the 'oldest job' at the same time (as read is non-locking operation by default), and all such workers will update the same record as 'queued' and will pick up the same job. If you use, SERIALIZABLE you will run into a bunch of deadlocks, two or more workers can read the same record and then will wait on each other to unlock the record so it can be updated.
Each Worker should own a private Job Queue. Make an Scheduler push new Jobs and balance the load between Queues. At any pont in time the max number of concurrent queries over the same queue is 2, the Scheduler and the Worker. In most of the cases the Scheduler will query for non running jobs to move them around and balance the load, so I think queries which will change the rows should lock the rows because the Worker may be querying for the same rows at the same time. This way just one Worker is locked down, the remaining keep working on their own queue. I made a comment explaining my solution.
@@heraldo623 Thanks for the reply. I was not asking for an alternative implementation description (I can come up with several myself). I was specifically mentioning that 1) there was no feedback at the end 2) there seems to be an apparent concurrency problem with the existing approach. Unless I am missing something and it might still work. That is why I was asking in the first place.
@@sergiiolshanetskyi5488 Clement made a very high level design of that system. There was a constraint of time to do the entire process, 30 min. But they did not show that the system does its job in 30 min. They could have described more the constraints (like available network bandwidth of workers and target machine) and details about the builder itself. How many time the builder takes to convert the code into a binary? Vague description of the system's responsabilities and constraints, so one can't tell if that works or does not.
Another solution to the healthcheck problem of the workers could be: We can have a column called 'worker_id' in the jobs table. This will contain the id of worker a running job is assigned to. Healthcheck service will contain the information of worker_id as well. Once it finds that one of the worker is dead, it will update in the jobs table based on the worker_id.
Great video. I totally recommend pausing the video after each feedback from interviewer and try to continue the design based on his comments. And great job, Clement. You basically recaptured Gitlab CI’s structure :)
I don't get why you'd set a heartbeat to write to the db that frequently. You could just set a timestamp of assignment and your health check could just check the timestamp delta. You could have a read query executed every 5 minutes, to find the running operations updated over 15 minutes ago to find the problem operations. But you could also benefit from a worker identifier in the table, and your health check could also be a queue solution to check both with your table and the workers, maybe the operation is just taking longer than expected so you also wouldn't want to reassign the operation to another worker and have the code deployed by two different workers.
This was very useful. Not only did it help me get a better understanding of systems design from a software development perspective, it also gave me some ideas for better designing data science specific projects. Thanks to both of you. Keep up the great work. 😊
Thanks Clément! What I really enjoyed about this video is that someone non-technical like myself, can still gain valuable insight/understanding into how to tackle these tricky system design questions. Great video!
I've always found estimating workload to be a challenging part of design, especially when you do not have as much information as you'd like to go on. So in one instance I have gone for determining the nightmare scenario and developing system specs based on that.
I wish I had an interviewer like this guy. Mine was just silently nodding to everything that I've said. And when I asked questions like "Should I care about this or that? What load can I expect? What are the system parameters?" he had one answer to all of them: "You can assume that anything is possible".
Thanks for the great content. Even if someone doesn't get a job at a FAANG company, it will help them become a better programmer. Your courses are definitely a step up from the vague books, tutorials, and blogs that were available sparingly just a few short years ago. While some of those resources were adequate for people who learn well from books, the videos help those who are visual learners by actually seeing the diagrams, and hearing what to say, how to say it, and what questions to ask. Thanks for providing this service.
4 года назад
I didn't understand why you took so long to start designing the actual system (40 minutes to design the queue). Also, a couple of times the "Binger" (ex-Googler) asked you a couple of times to explain what would back your queue. And when you said it would be a SQL database, he asked you how the dequeue operation would be, you spent more time than I expected to explain it. I would use some already existent queue system and use another database to store the building/deploying data. Despite that, thank you for the amazing content, Clement! keep it up!
Clement, Nice interview, I liked it. Only one thing, I am not sure about the fact that two servers won't pick up the same job using the code you've done. You might do a more complicated logic to pick up that job. If two jobs do the select at the same time, both will get the same job unless you are blocking the table.
@@muthukumarankothandaraman2371 when you say global there first thing I would ask is what do you mean by global? And this question was aked by the candidate in this mock interview, then what kind of devices or machines are target , along with geo-locations. Also, how devices or machines distributed across are supposed to receive or trigger the deployment build etc etc. At last there has to be some very generic speaking format among systems like XML, html and scripts basically the last part of design will define the technology stack. Very realistic interview. Love it. Btw I also when try ing to get into doing some design it turn out to be rabbit hole but then if one has clarity in thoughts then exit and entry points will always be clear to them. Having a clear thought process is a challenge. 😔
Great video man. Thanks! About the TRANSACTIONAL part, here is how I'd have done: Instead of SELECT the next JOB in the QUEUE, I'd have the WORKER UPDATING the table with its WORKER ID, and than SELECTING the only row that got affected. If you SELECT first and then UPDATE, there is a chance of to WORKERs getting the same JOB ID at the SELECT step. Another point: I thing there is no need to begin a TRANSACTION just for one INSERT. A transaction is made by multiple steps (the SELECT does not count as a step).
I am not entirely convinced that SQL scales horizontally. Especially when you realise how many records one needs to update on this single table. What I think it does well though - it illustrates the point and allows to throw some concrete details into the solution
Realistically you would move the completed transactions into a permanent storage log through a separate service to maintain SoC. Definitely do not want to keep them in a working table. IMO it's good that Clement makes oversights because that is what makes the interview authentic.
This is more about how to communicate than anything else. It is hard to take such thing seriously, real life tasks are usually more complex and you don't start from scratch, still seems useful.
This is really awesome. This video is an eye-opener. Now I consider the system design question is not just a question, but it is a real task in the company. And the interviewers already know what they want, but they don't speak for themself. So an interviewee should find out what they want using additional questions like real life!
About the heath check part. correct me if I'm wrong. I think the ideal solution will be using a zookeeper or a zookeeper-like coordinator. Reasons: 1. your surveillance service can also die. 2. Times stamps rely on system clocks. In some cases, clock may pause or even go backward. In those cases, you are killing a healthy job
Lots of great stuff covered here. I would've personally covered: - the actual scm deployment platform integration (e.g. webhooks off the back of git commits) - and build/dependency caching as well.
@Clément Mihailescu : How do we ensure the goal state set right - what I mean is say commit1 we generate build B1 and would like to deploy it - followed by another engineer who do commit2 and generates build B2 and would like to deploy it. When B2 gets deployed which means that B1 changes would also get included. So when it's deployed in the following order B1 -> B2 then there will not be an issue. But what happens when the build is deployed in the order B2 -> B1? How does the system here captures it? Or am I missing something here?
I don't know what the answer is, but I would imagine that a message queue works for maintaining which requests are available, but by itself it's not enough. For example, what happens when the request fails or times out? How do you make sure that exactly one request is being processed by the workers? How can you represent the status of all jobs, past and present? My first reaction is to implement using AWS SQS; by itself, SQS FIFO might handle exactly once, some basic request error handling, but based on the read/write volume and the use cases, I think a SQL might be a great first pass.
@@knightwfu I think a message queue would actually be a better option. You could set the prefetch to 1 to make sure the worker only consumes one message and when a worker starts building you could dispatch an event. You would only acknowledge the message after the build is done, in that case if the worker stops and a new one would start up it will just try the message again. Whenever something happend you could dispatch an event (for example BuildStarted, BuildFailed, BuildDone and BuildCancelled). These events would be consumed by another service which builds an representation of the current status. Also when you want to cancel the current build you could make sure that all messages for a certain job would end up at the same consumer (for example with a consisten hash exchange in rabbitmq).
I think the thread above is great, a point I want to call out is Messaging systems like Kafka are durable message queues in the sense that messages are persisted to disk. This actually could have been expanded on rather than going with the SQL Table approach. If this system is used by the entire company for every commit of every repository this won't scale well with a SQL Table as you'd have more writes than reads due to status updates. Messaging system would be more scalable not just with durability but also performance as you can partition on the hash of the repository and preserve order of the commits.
Learning some of the ideas in here helped me land a FAANG job. As a junior engineer I didn't think to use a SQL database to implement a queue for real systems at scale.
When we got to the part where we were trying to figure out a way to find out if a worker died, I was shouting "heartbeat mechanism" sitting in front of the screen, coz that was the first concept we learnt in our software architecture class. It was so satisfying when I heard you said the same haha. Great video and it does feel like a very real interview
There is a race condition in this design... simply selecting the next queued job does not give you a write lock on that row, so 2 workers could select it and try to update it, and they will both succeed, resulting in a duplicate parallel build on a second worker. Could add and 'AND status = "queued"' to the update clause to avoid this.
Tip: Don't reinvent the wheel, just use a queue service or job scheduler service. Experienced senior engineers should know when to implement something and when not to. I wouldn't hire this candidate if it was a senior engineer interview
Why do interviewers ask you to write merge ? When sort() is available in all the languages. Because interview is all about knowing if the candidate knows the basics well and if they can INVENT THE WHEEL if need be. You senior engineer would probably get stuck, if GCP / AWS stop their services for some god saken reason.
@@HallaBhalla If the question is to design a Queue service, then you implement it by using the basics. If question involves using a queue service then, you don't re implement the queue service. I said a senior engineer should know when to implement something and when not to.
First of all, thanks for making these available Clem. I seen a lot of videos about the systems design interview in preparation for my interviews and I'm a SystemsExpert customer, but I'm wondering if you could explain the difference between a novice , intermediate, and expert level performance. What criteria can be used to determine what level someone is and how able they are at designing good systems. Thanks!
I might have missed something, admittedly the video was pretty long, but you asked for feedback in the comments so here we go: Given how in depth a lot of the solution was, it seemed that how the binaries get into the regional peer to peer networks was missing. You covered that GCS will magically replicate them for you, but then you jumped right into those binaries going from host to host. How does it get into the network? I am assuming that it would just be the first host to poll KVS and see a new state, then poll the network and nobody see nobody else has it so it goes to fetch from regional GCS. Why not expand the enum in the jobs table so that you can have a single source of truth with regards to deployment? Something like: Queued, Building, Replicating, Deployable, Complete, Failed. Also with that DB, why not just use the last_h column as a phase_started time. Since you are still benchmarking against expected worst runtime, you can can keep your query basically the same but also track failures across the whole process (and not just building), without needing the heartbeat system for builders/workers. I probably also would have asked a little more about the code that we are deploying. Is it only one codebase, or should this work for multiple applications? Do we only need to support one deployed version at a time? Especially if the answer to the second question is yes, then the whole goal state solution would likely need to be changed. Overall great interview, I think the most important part was watching how you took your time and thought through everything slowly and deliberately while asking for clarification at basically every step
I'm not getting how the 10gb binary gets to all the regional GCS either. Or how p2p networks work. I would think a 10gb file going from country to country would take a while, especially at if it's done ad-hoc at the time of the app release? What if it's a 5TB binary? My guess from what was meant in this video is that regional host 1 (rh1) sends to host 2 (rh2), then rh1 and rh2 send to rh3 and rh4, incrementally/exponentially being able to send binary/replicate more at a time? Alternatively, with an entirely different take: I heard that making incremental patches to source code as patches is very quick. So, why can't the version control system live in each region, and we just ensure they all have the latest commit. Then, each regional host just builds the binary prior to the scheduled release time of the app, and there wouldn't be a need to download a huge binary across countries.
There is a pretty glaring issue with this discussion. The discussion/solution proceeded as though there was only a single application, which was deployed to every server. Maybe that’s not absurd in its own right, but the problem states that there were around 5,000 new builds per day. That would mean this one application was updated every 15-20 seconds! Clearly, a system that can deploy 5,000 different builds per day only makes sense if the system is supporting multiple different applications. And, providing such support would require a much more involved design than what was discussed.
Hi! Just to clarify, did he forget to add FOR UPDATE to SELECT and lock the job row before updating a status of it, so that some other worker that does the query at the same time will not take it as well?
that's what BEGIN TRANSACTION/COMMIT do. It enables the code inside that block to run as an unit, so nothing can happen between the SELECT and UPDATE queries
@@luisfguadagnin Yes, but next job will get this row and during the update we can have an error or row can be anyway updated by another job (depends of transaction type and database used). Writing FOR UPDATE will lock this row and make them invisible for another transactions.
Thanks for making this great video. While watching the video these two points came to my mind 1. Do we need an event system that generates an event whenever a new commit is made instead of the worker checking for new jobs? 2. Having a time limit for a job to run can solve the issue when the system goes down.
About the update on the RUNNING job when the node fails... if there's a network problem in the node and the hearbeat period expired, you update the job back to QUEUED, but that node might come back alive after another node picks up the re-queued job, finish it's job, and update the job to SUCCESS, and after a while, the new node that started re-processing the job, would also update the job. Not a huge problem, but might interfere with the build history.
Thank you so much for this mock interview. This is one of the best video about system-design interview for starters to learn on what to expect in a design interview and how to respond to make the interview a success....
Fantastic video. Full of knowledge and very engaging. Thanks for this Clement! One different approach for managing the workers could be to have just one service poll the SQL table (queue) and spawn new instances on demand (Spot instances, may be) to run the jobs. Rest of the architecture remains the same i.e the spawned instances still send heartbeat signals. This could prevent over provisioning of workers especially during times when no builds are run (code freeze) or very few builds are run. But it comes with an additional waiting time equivalent to that taken by a new instance to boot up when none of the available instances are available to take up a new job.
Instead of using SQL database to backup queue and store worker servers health, couldn’t we use off shelf solution like SQS which is durable and handling queue dying. Then use Lambda to run the queue, which will handle the server dying as well as the concurrency problem.
I think "begin transaction" at ~27:39 does not really mean that select is concurent safe. I would rather use something like "select ... for update" statement (depends on db). I know he assume "serial sql job service" but it's not really prepared for pooling or horizontal scaling from the sql point of view.
So I felt like you had an overly complicated solve for the monitoring for the workers. The problem parameters said that the deployment needs to be completed in 30 minutes. Figure out how long it takes to move built code to servers let's say 10 minutes, and allow 20 minutes for build. Record the last time status changed, and when the rows are RUNNING and status was set > 20 minutes ago, update status to FAILED: Update StatusTable SET Status = "FAILED" where Status = "RUNNING" AND LastStatusUpdate > 20 minutes;
You're building queue functionality into af database, presumably to achieve multi region fail over. You'd have to use a serverless db, to avoid massive replication overhead for a single table, use a Running TTL column updated by the worker that picked up the job to set the TTL to 20 minutes later, the worker-pull query selects the latest job with TTL after now(). You need a worker-finished query to delete the record when the worker process COMPLETES.
I love your alg interviews but found the sys interview unsatisfying in terms of high level architecture, how the details of approaches used Great scoping of the question tho!
We dont know if that solves the problem because the problem itself is not well defined. The time constraint depends more of the builder than that deploy sistem lol
thank you, i learned a lot and you really refreshed my SQL knowledge 1) in building code sub-system i think (38:30) you can check the heart beet when you enqueue the job so you don't need another aux workers ex ( select * from jobs where (heart_b < current_time -10m and status="running ") or ( status=="enqued" and create_at...) ) 2) the problem in the deployment system is what if multiple engineers at small periods click on deploying a build ( each machine will stop and start downloading the build again) i would love to hear your opinion
About engineers trying to deploy when there is already a deploy going on. Well the Build Subsystem could build the code and the Deploy Subsystem could distribute the build as a new release to all regions as usual. It's up to the engineers to decide if the ongoing deploy should be cancelled or not. New releases are asynchronously made, only the deploy to the production machine, in this scenario, is synchronous (containers could be used to isolate the application environment from its hosting machine, this way multiple releases could be deployed asynchronously)
For this interview I would use something I learnt at my masters: parallel scatter: Each node takes the code and pushes to production AND sends it to the nearest next node. So node 1 does the job and sends it to node 2. Now node 1 is free, node 2 is processing. That means node 1 can send it to node 3 and node 2 to node 4. Repeat this recurisve so step 1: 1 sends to 2. Step 2 1 sends to 3 2 sends to 4. Step 3: 1 sends to 5, 2 to 6, 3 to 7, 4 to 8. As we can see, we have a 2^n exponential speed, which is really fast. You REAALLY want to avoid first in first out :D, that is the slowest way
My solution: In Build Subsystem each Worker could have its own Work Queue, a new component Work Scheduler comes in responsible to take new commits from source code repository, make a new build Work and push to most unloaded Work Queue. Periodically the Work Scheduler would check the Work Queues to balance the load and prevent Workers from being idle, moving Works between Work Queues. The Worker itself should be able to detect failures and recover build processes by implementing a Build State Machine for each Work and recording in which Build State the Work is actually (sub states of main RUNNING state). When the Worker is idle, it will try to begin working on oldest Work in RUNNING main state, if there is none (there would be only one or zero because the Worker works synchronously, but the Worker Scheduler may push RUNNING Works coming from dead Workers), it begins working on oldest Work that was not RUNNING, if there are RUNNING Works, it starts from the recorded Build State for that Work. This way the Worker recovers naturally from failures because it is agnostic to failures. But the Build State may be corrupted (lacking input data to begin working on), when this happen the Worker would try to playback the State Machine until it finds a Build State that can be run and begin working from there. It requires that each Build State be idempotent. The Work ends when the Release (Version + Application Binary + Configuration) is stored in the Master Release Storage (a blob storage). Now begins the Deploy Subsystem. The Deploy Manager would take new Releases from Master Release Storage and sync it to Release Storage of each Deploy Region. Each Target Machine is attached to one Deploy Region, it may have a client daemon running that listens to commands from its Deploy Region or the Deploy Region connects to it and run commands. When the sync ends, all Deploy Regions receives a "deploy" command from the Deploy Manager telling to deploy a given Release, the Deploy Regions would run Deploy Routines in its Target Machines to setup the environment based on the Release's Configuration and installs the Release's Application Binary.
@@heraldo623 The servers. They have basically 2 states: building code ( this is a local action) and deploying code ( send the built binary into the target system). Usually the send part is the slowest, depending on build size. So.. we maintain a list of closest neighbours like a graph (based on DNS) because that would be the fastest to send. There is a master server that receives the Queue and distributes the workload to each node. But each node will receive the entire Queue (slow at first step but will speed up by 2^node_number factor) using my previously described method. Recursively a node does the following: Receive the workload and it's assigned commit. Resolves the commit build, and sends the worload to the neighbour and the binary to the designated target
@@nagyzoli So a node receives the entire Work Queue, pops one Work, does the build, sends the remaining Work Queue to next nodes and deploy its build to known Target Machines. How each node knows which Work to pop from Work Queue? Because one node sends the remaining Work Queue to N >= 1 next nodes.
@@heraldo623 You divide the original queue into batches, you do not make it continuous. Let us assume we have 100 nodes, and a 1000 commit to build under an arbitrary timeframe, assuming 1 day. The master node sends the Q and the ID / sha of the commit to each node AND DOES NOT increases the Q for that day. If the number of build "slots" are filled you refuse new jobs till the batch is processed.
It is odd that Clément worked at Google because the deployment system he created has nothing to do with what is used at Google. (Contrary to what the interviewer suggested, I worked there) In fact, it would not be used anywhere. Simply put, it is very rarely acceptable to deploy to all of the machines at the same time. Any non-trivial application would have a non-zero startup time (often measured in minutes) and you don't want your service to go dark while the fleet is ramping up. You would use a gradual rollout, often with multiple stages, and have a dedicated system which manages the whole process. I was going to buy SystemsExpert, but this and a couple of smaller misses gave me a pause. The video does give me some interesting ideas on how to communicate in a system design interview, though. Start with the birds-eye view, identify the blocks and then address them one by one. Be very methodical, very pedantic, very thorough. Go into obscene amount of detail. Check back with the interviewer regularly. Clarify the constraints in the beginning, even the ones that seem self-evident. I tend to jump quickly to the trickiest areas and bottlenecks, so this is quite helpful for me.
Do you hv to design everything from scratch, like the queue, why not pick something off the shelf directly? In the regional key value store discussion, which you did, pick etcd or zookeeper, right?
Question. Why didn’t you try using a queue service or Kafka service in cloud instead of writing them to Sql database and then workers polling the db to start the build. Wouldn’t it be more efficient and simple when we using a AWS sqs or equivalent in gcp
Isn’t it better to first find the numbers before jumping in the design? Without knowing how many workers you have you cannot say if MySQL is a good option or not. Also instead of SQL, persistent queues I think would serve better this problem
Thank you for sharing insight into the process of system design. I think an important question relating to the regional swarms sharing new builds may have went unanswered. Using this design, how do you determine which node(s) download the builds initially from GCS in order for it to be shared via the p2p network while avoiding hundreds or even thousands of them simultaneously attempting that initial download?
I had the same thoughts. I think you could implement an election process to select 1 or more leaders who would be allowed to download from GCS. If the leader dies, the P2P network would then elect a new one.
I think the overall solution is good although I don't love the polling all over the place. I think using webhooks or as another comment mentioned a message queue would be a great improvement.
Awesome. Just like the interviewer said, this architecture is very close to the one's being used in the real industry - just like in my company. I was so amazed you just thought of the design on the spot. Again, great content!
The last part of the build trigger P2p network can be handles by a consensus protocol like RAFT, most disributed consensus systems today (Consul, Kubernete, nosql DBs etc) are handling state in that way and sending commands to the rest of the workers to set the desired state.
Question: Clemente says at 41:40 that this design is super horizontally scalable. How is it super well scalable when we use a SQL database? I thought SQL databases scale bad horizontally.
SQL databases are not horizontally scalable in true sense. however, you are not really talking about volume of data here. a simple sql db can easily meet his requirements as data volume is pretty low.
Just a side note, for the SQL, if multiple workload executed at the exact same second, they may pick up the same "Queued Job ID" because the transaction will not Lock the Record until Commit. Need to do it with pessimistic locking.
Is the queue + polling approach necessary for a company with infinite compute resources like google? Couldn't you just monitor utilization of the worker pool and dynamically scale it up to meet surge demand? This way, your "queue" microservice could just log the job state and propagate the build call on to the next available worker node. Would your score get docked for answering this way?
For the queue of jobs, isn't the SQL database of jobs the single point of failure here? Perhaps I missed something in this part, but this is the part where it seems like there would be a solution like a messaging queue with clustering or whatever way of replication.
I have two questions: 1) How many times do we need to try to build a push? 2) What if multiple engineers pushes the button to deploy their code one after the other rewriting previous state of KVS which was not complete?
1) I would say - we should discuss with the interviewer on the Retry mechanism. That would be another point to chat about. 2) I would use Total Ordering algorithm.
1. How to cancel a deployment. 2. Could you talk about idempotency, if the user clicks the deploy button multiple times. 3. Seems like the second sub system was hurried up. Could have covered deployment lifecycle management in more details.
Instead of polling for the last bit we can potentially have kafka broadcast the signal out and then write to a table once all the systems have been successfully deployed and this can update the UI via SSE or something similar.
I hope you all find this mock systems design interview insightful! And if you're in the business of acing your systems design interviews (who isn't? 😎), be sure to check out www.systemsexpert.io where we have more questions like this one as well as a full course on all of the fundamental systems design topics that you need to understand to tackle these types of interviews.
who is the guy taking interview? where he works ?
Clement, I have some questions.
What is the fastest way to become a junior developer in 2020???
What is the best way to learn programming to build advanced projects???
it will be very useful to make another Systems Design Interview?
Dude grow some beard. Would look good🔥
make sure you buy other domains with the same name
🤔✌
I'd like to hear the feedback from the interviewer, I think that's the important part on distinguishing what went well and what didn't
The feedback is probably that he's missed a lot of corner cases (interviewer mentions at least 2 of them in the video), and I don't know why he's not using a messaging queue instead. In short, this is probably a weak interview.
Normally, the interviewers not sure about the solution themselves or they would have a template solutions which they feel that would fit every problem. So, they would not be able to comment on our design.
@@bluespeckcf5949 I thought he could have used a message queue too, but, it wouldn't allow monitoring workers which have gone awol.
@@bluespeckcf5949 can you elaborate a bit more? What exact corner cases did he miss?
@@cloud15487 For Example, one that I caught, was the worker heartbeat, what if the job is successfully completed but didn't send a heartbeat due to a network error, you wouldn't want to run the build again. Another is he never mentioned storage for the build artifacts, and how to resume a build from the artifacts, or use them, if you are building a build system, you definitely need to cover some sort of distributed storage for build artifacts, if you are building 100k times a day you don't want to have to download JS, or .NET, 100k times, even if you are pulling from your own repository you want to eliminate as much bandwidth usage as possible. BUT overall this is a good interview, maybe not perfect, but that is typical for most interviews, I think it was more important to show how he interacted with the interviewer, asking questions, clarifying requirements. Most companies want to see how well you handle overall project communication so as to best deliver according to the stated requirements.
wow...simply amazing.....this is like a real world experience for the system design interviews
most of the videos on youtube are just about explaining the system design about some popular applications like messenger, tinyurl, dropbox etc...
all of them have one thing in common....all of them explain the design very well in only one aspect....they are all well structured, there are no hiccups, there is not much thinking to do.....they design the system without any flaws...why? because they know the design and there is no interviewer who is asking for in depth details or looking for more clarifying answers...they just explain the system in only one direction
This is totally different and the one which we actually need.
This is by far the best for multiple reasons
1. It shows how a real world interview looks like with all the white boarding sessions
2. the question given by the interviewer is very abstract....the interviewee may know the design in one particular aspect, but the interviewer is looking for some different aspect and asking more in depth details at every stage
3. the interviewee actually asks a series of questions at the start and also keep asking different questions time to time
4. at times the interviewer stops the interviewee because he missed explaining few details and missed details about fault tolerance and keeps asking for more in depth details about how the interviewee is approaching his design
5. its more of a collaborative brainstorming which actually happens in real world
6. it also showed how the interviewee picked some technologies like SQL, GCS, KVS etc and validated it with his explanation, getting approval of the interviewer and proceeding to the next stage at every point in the interview.
Totally worth watching the whole video.....thank you Clement and kudos for creating such a valuable content
hope to see more such videos
Yeah this is exactly how this goes in real interviews. When I interviewed at Amazon for engineering manager, I was asked to write an analytics system with 1 million writes per second. It was a bit more challenging than this example in terms of scale, but still similar in concept.
I used to have zero understanding of the importance of clarifying requirements and used to ask typical dimension clarifying question without knowing how that could actually help me solving the problem at hand. You really showed how you could incorporate the information you gained by asking clarifying questions through this mock interview - appreciate it
What I loved most is, it actually went on like a real interview, where candidate doesn't know all the answers, and kept building on the assumptions, kept extending the designs. Thank you for making it look realistic!
The SQL transaction would ensure that all of the SQL statements would run together or not at all (rollback), but it doesn't ensure atomicity. In this design, to prevent multiple workers from running the same job concurrently, wouldn't you need to specify the isolation level on the transaction in order to prevent a non-repeatable read? Example: worker 1 selects the oldest "queued" job. worker 2 selects the oldest "queued" job. worker 1 updates the job status to "running".
Or use a row level lock?
"Ex-Googler" means that you now use Bing or what? /s
hahahahh underrated joke
Yep, now he uses Bing to see StackOverflow answers
maybe duckduckgo
He still uses RUclips though 🙂
It means he's not evil anymore lol
This is VERY helpful. This is literally better than any video that just tells you general tips that are hard to follow in an interview setting.
Every time clement was the interviewer but now for a change clement is the interviewee.
Gotta flex my owns skills _sometimes_ ! 😎
Lol clement does sound like a interviewee
It seems like the interviewer is getting educated from clement lol
Chinglin Sheng An interviewer has no motivation to speak a lot, while an interviewee needs to think loud
This was really good. I often paused the video to offer my 2 cents and thought out loud. Really liked the collaboration between Clement and the interviewer. Also, one subtle thing is Clement’s frightlessness in asking even the most simple questions and repeating things in his own words. This is what allowed him to really build and understanding of the problem, as the interviewer’s words may not click the first time and they are meant to be ambiguous so it fosters clarifying questions.
His interviewing skills really showed off here. So much to learn!
So many questions :)
1. How do you handle jobs which require building in multiple architectures (x64, arm, etc)
1.5 How would you handle scenarios involving multiple branches?
2. Why use SQL while there are so many distributed queue alternatives (SQS, for example) which let you handle scenarios like a builder failed elegantly?
3. Why store heartbeats in the same table? Why not have the builders register with a configuration service like Zookeeper which can take care of heartbeating?
4. How are build logs stored? How often are they garbage collected?
5. With this approach the SQL table will continue to grow infinitely which will make queries more and more expensive over time. What would be the table cleanup strategy.
6. How do you ensure that all machines have received a certain build?
7. How do you ensure that a malicious system doesn't pose to be a 'seed' and try to deploy malicious binaries?
No one:
Clément: Thinking back to my experience with code deployment systems.
I will leverage that :D
Great video Clement. Thank you for sharing an insight into how these interviews are conducted.
2 things I would have prodded more for as the interviewer are:
1) How do you deal with downtime while switching between the versions?
2) How do you manage DB changes as well as the code changes discussed here.
Most time spent on build system, not too many details covered about the deployment systems like safety of deployment, health checks, configurable deployment speeds, rollbacks, global service health \ availability checks to pause, incident management. These things are highest order bit for any large scale company as no one wants a global or regional outage.
Best way for the SQL-Statement at 27:00 would be:
BEGIN TRANSACTION;
SELECT *
FROM jobs
GROUP BY status
HAVING status = "QUEUED"
AND created_at = MIN(created_at)
LIMIT 1;
UPDATE jobs SET status = "RUNNING" where id = our_id;
COMMIT;
I loved this. Thanks so much for taking the time to put this high quality video! I've a small comment that I can't get out of my head: The SQL transaction that workers use to grab jobs seem to have a race condition that might make 2 or more workers grab the same job. I don't think ACID here protects from that. Locking needs to be done somehow to prevent that, and the first option I'd explore, as it's the simplest, is to change the SELECT to SELECT FOR UPDATE, assuming the underlying SQL engine supports it which is guaranteed to lock the resulting row for a single worker until it had a chance to perform the update on the status. If SELECT FOR UPDATE is not supported, we can explore other options such as manual row-level or table-level locking.
Great stuff, thanks again!
Was thinking the same :)
Same thought. The transaction's atomicity does not mean only one of the concurrent update requests will be success.
Exactly my thought. Plus I was thinking why he wouldn’t use a tool built for queues instead of making sql fit into this use case. Especially when the interviewer asked him do you think this will serve the purpose and he doubled down on it lol. Note to myself: listen carefully when such a question comes up haha
Came here to say this
You can actually build a queue in SQL without SELECT FOR UPDATE, or any implicit locks. You do an update on the row, setting its state to RUNNING, where the id matches and the row is in the QUEUED state. If the affected rows is 0, the task has already been picked up. If it is 1, you have picked up the task. Note that autocommits need to be enabled on the UPDATE, or there is no difference from SELECT FOR UPDATE.
This was so good. Thank you @Clement for giving us an insight on what goes on in a real system design interview. Kudos to the interviewer as well. He asked brilliant questions and put up some interesting extra questions to make it difficult yet understandable.
Subscribed ✅
Great interview video: I liked the most is the interaction between the interviewer and interviewee. Some thoughts on the design: 1) the SQL database can be a Queue system such as Kafka or Pub/Sub from GCP, which handles the scale very well with partitions. Given the large scale, the frequent query and updates to the SQL database will harm the performance, especially update the worker's heartbeats every second in the table. So a message broker will be the best solution. Worker status / heartbeat should be monitored and reported to a monitoring system such as Prometheus. Worker can push its health check status to Prometheus. 2) The interviewer tried to ask system design in Non-Abstract way, meaning how many servers are needed, do an analysis on bandwidth, timing, storage in a datacenter, and non-uniformed load (x1.25), then 3 datacenter deployment in a region, to derive the footprint per datacenter and global footprint. I hope the interviewee can elaborate on the Non-Abstract design more. Overall, a great system design interview. Thank you very much.
I guess problem with Kafka is that it stores the msges in topic only for some period of time, and also how would you update the record inside the topic . But there could be a sql - Kafka replica tho
I had imagined a distributed queue and a separate process to store the jobs for retaining history if needed. DB for queue will be expensive. Maybe even a DB+ Redis Cache where the server pulled off from a queue structure if access to cloud services is a constraint
100% right. I can add two more things on top of this.
1. Why building SQL queries is important for system design interview? We should deal with more abstract design, and maybe, later, it's possible to talk about SQL in depth, but they wasted so much time on it.
2. The presented SQL query will now work properly under load, this is why a queue system is by far superior, I couldn't understand why he wouldn't just use that, we can retain the data for as long as we want and slap extra metadata on top.
I think SQL is indeed a better choice than message queue (i.e., Kafka) here. MQ for stream processing might be appropriate for super high QPS workflows, but assuming a giant tech company with 40k devs each with 2 commits per day, that's only 0.9 QPS builds coming into this build system, which is well within the abilities of a SQL DB and complete overkill for MQ. Additional reasons why MQ is not an ideal option:
1) MQ tends to have low retention, you'll need to end up storing your jobs in some persistent store like SQL or NoSQL DB anyway to preserve full history
2) It's difficult to parallelize MQ unless you have multiple partitions / topics which match the number of workers -- this is because coordinating multiple workers consuming the same MQ partition is difficult. Yet the low QPS of input to this system does not warrant partitioning to me.
3) As the presenter explained, SQL DB has built-in consistency where build workers need to acquire a read/write lock to obtain the rights to build a job. MQ does not provide this out of the box, so it's up to client to coordinate worker pool. See point 2 above on why this is difficult.
4) MQ does not preserve order of jobs between partitions, and it was a requirement that jobs be roughly started in the same order they come in.
I think jumping to the conclusion that we need MQ for *any* kind of queue can be a detrimental mindset in system design, and could hurt a candidate in a system design interview. It's worth understanding the tradeoffs and explaining this to the interviewer.
@@Core433 How does one enter the career path of system design? What prerequisite knowledge, certifications, skills would you think they would need? Are there particular programming languages or tools for home situations one can practice in by designing their own simple small systems using free tools?
Great to see this sort of thing. So many people trying to get in to tech are obsessed just with software engineering. There's a whole gamut of careers and skill sets in tech, and hopefully this will open people's eyes to the opportunities aside from software engineering (nothing wrong with software engineering, btw).
This is the kind of interview *is* for software engineering. This is what senior SWEs at major tech companies go through in the interview process.
Clement you nailed it man. Fabulously awesome content. I’ve learned a lot from this. Please make more videos like this. 👏👏👏
Your help and free content is appreciated, and I am rooting for the success your entire business. There is nothing brighter than an honest and smart effort. Keep shining.
man they are so calm and composed. Indian interviewers get irritated when I ask them questions to understand their questions
Thank you for showing your prejudice for all to see. Btw this is a mock interview, not a real one. The Indians you speak so “highly” of are under pressure of a real interview.
Such an AMAZING video Clement. Before this video, I went through few others but this one literally beated them all. The quality of the questions asked and the way you responded gave a Google feel. Thanks ALOT for bringing such high quality content. Absolutely terrific.
Brilliant setup. Great cross questioning and really makes the viewer think. Good going Clement!
This is really beneficial, thank you for taking the time to go through a full mock interview.
Clement, I really like the way how questioned to dissect every aspect of the question, to collect as much as information which is a super motivating tool for all developers.
Clement, thanks for the video. It would be nice to have a summary at the end - what the interviewer thinks you did well/what could have been better. Or the assumption is that you did everything perfectly fine?
One specific question/concern I have is - the interviewer asked you if you can have concurrency problems when querying your queue log table from 100 workers. You said 'no' and wrote a query you would use. What transaction isolation level you meant when you were creating a transaction in your query?
If you use MySQL for instance, and REPEATABLE_READ (the default one), two or more workers can read the same record as the 'oldest job' at the same time (as read is non-locking operation by default), and all such workers will update the same record as 'queued' and will pick up the same job.
If you use, SERIALIZABLE you will run into a bunch of deadlocks, two or more workers can read the same record and then will wait on each other to unlock the record so it can be updated.
Each Worker should own a private Job Queue. Make an Scheduler push new Jobs and balance the load between Queues. At any pont in time the max number of concurrent queries over the same queue is 2, the Scheduler and the Worker. In most of the cases the Scheduler will query for non running jobs to move them around and balance the load, so I think queries which will change the rows should lock the rows because the Worker may be querying for the same rows at the same time. This way just one Worker is locked down, the remaining keep working on their own queue. I made a comment explaining my solution.
@@heraldo623 Thanks for the reply. I was not asking for an alternative implementation description (I can come up with several myself). I was specifically mentioning that
1) there was no feedback at the end
2) there seems to be an apparent concurrency problem with the existing approach. Unless I am missing something and it might still work. That is why I was asking in the first place.
@@sergiiolshanetskyi5488 Clement made a very high level design of that system. There was a constraint of time to do the entire process, 30 min. But they did not show that the system does its job in 30 min. They could have described more the constraints (like available network bandwidth of workers and target machine) and details about the builder itself. How many time the builder takes to convert the code into a binary? Vague description of the system's responsabilities and constraints, so one can't tell if that works or does not.
Another solution to the healthcheck problem of the workers could be:
We can have a column called 'worker_id' in the jobs table. This will contain the id of worker a running job is assigned to.
Healthcheck service will contain the information of worker_id as well.
Once it finds that one of the worker is dead, it will update in the jobs table based on the worker_id.
Great video. I totally recommend pausing the video after each feedback from interviewer and try to continue the design based on his comments. And great job, Clement. You basically recaptured Gitlab CI’s structure :)
I don't get why you'd set a heartbeat to write to the db that frequently. You could just set a timestamp of assignment and your health check could just check the timestamp delta. You could have a read query executed every 5 minutes, to find the running operations updated over 15 minutes ago to find the problem operations. But you could also benefit from a worker identifier in the table, and your health check could also be a queue solution to check both with your table and the workers, maybe the operation is just taking longer than expected so you also wouldn't want to reassign the operation to another worker and have the code deployed by two different workers.
This was very useful. Not only did it help me get a better understanding of systems design from a software development perspective, it also gave me some ideas for better designing data science specific projects.
Thanks to both of you. Keep up the great work. 😊
I can recommend “grokking the system design interview” on educative.io and github.com/donnemartin/system-design-primer#index-of-system-design-topics
@@stepanseliuk8042 bra
Thanks Clément! What I really enjoyed about this video is that someone non-technical like myself, can still gain valuable insight/understanding into how to tackle these tricky system design questions. Great video!
This is more like an interview about implementing a scalable queue.
Thank you Clément for the great video. I learned so much about System Design in less than an hour. Keep it up!
I've always found estimating workload to be a challenging part of design, especially when you do not have as much information as you'd like to go on. So in one instance I have gone for determining the nightmare scenario and developing system specs based on that.
I wish I had an interviewer like this guy. Mine was just silently nodding to everything that I've said. And when I asked questions like "Should I care about this or that? What load can I expect? What are the system parameters?" he had one answer to all of them: "You can assume that anything is possible".
Thanks for the great content. Even if someone doesn't get a job at a FAANG company, it will help them become a better programmer. Your courses are definitely a step up from the vague books, tutorials, and blogs that were available sparingly just a few short years ago. While some of those resources were adequate for people who learn well from books, the videos help those who are visual learners by actually seeing the diagrams, and hearing what to say, how to say it, and what questions to ask. Thanks for providing this service.
I didn't understand why you took so long to start designing the actual system (40 minutes to design the queue). Also, a couple of times the "Binger" (ex-Googler) asked you a couple of times to explain what would back your queue. And when you said it would be a SQL database, he asked you how the dequeue operation would be, you spent more time than I expected to explain it.
I would use some already existent queue system and use another database to store the building/deploying data.
Despite that, thank you for the amazing content, Clement! keep it up!
Hi Clement you're really doing great works for people, I like the way you shares these insightful contents.
Clement, Nice interview, I liked it. Only one thing, I am not sure about the fact that two servers won't pick up the same job using the code you've done. You might do a more complicated logic to pick up that job. If two jobs do the select at the same time, both will get the same job unless you are blocking the table.
The interviewee was quite weak I have to say.
He could have done at least a SELECT FOR UPDATE SKIP LOCKED or equivalent.
You're great. Keep creating quality contents for us !
I'll do my best!
I would also have asked to which platform/OS we deploy: linux, windows, ... etc. Thoughts?
Is probably one OS, most likely Linux unless is some dotnet or iis product.
@@muthukumarankothandaraman2371 when you say global there first thing I would ask is what do you mean by global? And this question was aked by the candidate in this mock interview, then what kind of devices or machines are target , along with geo-locations. Also, how devices or machines distributed across are supposed to receive or trigger the deployment build etc etc. At last there has to be some very generic speaking format among systems like XML, html and scripts basically the last part of design will define the technology stack. Very realistic interview. Love it. Btw I also when try ing to get into doing some design it turn out to be rabbit hole but then if one has clarity in thoughts then exit and entry points will always be clear to them. Having a clear thought process is a challenge. 😔
@clement which tool do you use for white boarding and drawing for system design?
There's literally thousands of drawings apps.
Great video man. Thanks!
About the TRANSACTIONAL part, here is how I'd have done:
Instead of SELECT the next JOB in the QUEUE, I'd have the WORKER UPDATING the table with its WORKER ID, and than SELECTING the only row that got affected. If you SELECT first and then UPDATE, there is a chance of to WORKERs getting the same JOB ID at the SELECT step. Another point: I thing there is no need to begin a TRANSACTION just for one INSERT. A transaction is made by multiple steps (the SELECT does not count as a step).
I am not entirely convinced that SQL scales horizontally. Especially when you realise how many records one needs to update on this single table. What I think it does well though - it illustrates the point and allows to throw some concrete details into the solution
Realistically you would move the completed transactions into a permanent storage log through a separate service to maintain SoC. Definitely do not want to keep them in a working table. IMO it's good that Clement makes oversights because that is what makes the interview authentic.
This is more about how to communicate than anything else. It is hard to take such thing seriously, real life tasks are usually more complex and you don't start from scratch, still seems useful.
Kudos to you Clement and team. This is one of the best interviews I have seen in a long time. Thanks for sharing!
This is really awesome. This video is an eye-opener. Now I consider the system design question is not just a question, but it is a real task in the company. And the interviewers already know what they want, but they don't speak for themself. So an interviewee should find out what they want using additional questions like real life!
Going for multiple onsite interview all with system design interview, this is the most helpful material I have seen so far
About the heath check part. correct me if I'm wrong. I think the ideal solution will be using a zookeeper or a zookeeper-like coordinator. Reasons: 1. your surveillance service can also die. 2. Times stamps rely on system clocks. In some cases, clock may pause or even go backward. In those cases, you are killing a healthy job
I am so inspired and motivated by your videos Clement. Gonna dive into AlgoExpert starting tomorrow, super excited!
Lots of great stuff covered here. I would've personally covered:
- the actual scm deployment platform integration (e.g. webhooks off the back of git commits)
- and build/dependency caching
as well.
@Clément Mihailescu : How do we ensure the goal state set right - what I mean is say commit1 we generate build B1 and would like to deploy it - followed by another engineer who do commit2 and generates build B2 and would like to deploy it. When B2 gets deployed which means that B1 changes would also get included. So when it's deployed in the following order B1 -> B2 then there will not be an issue. But what happens when the build is deployed in the order B2 -> B1? How does the system here captures it? Or am I missing something here?
Clement, I am curious why you didn't use a 'message queue' instead of the SQL table for task queueing?
I have the same question, why not something similar to kafka.
I don't know what the answer is, but I would imagine that a message queue works for maintaining which requests are available, but by itself it's not enough. For example, what happens when the request fails or times out? How do you make sure that exactly one request is being processed by the workers? How can you represent the status of all jobs, past and present?
My first reaction is to implement using AWS SQS; by itself, SQS FIFO might handle exactly once, some basic request error handling, but based on the read/write volume and the use cases, I think a SQL might be a great first pass.
Using Database for queuing is famous anti-pattern.
@@knightwfu I think a message queue would actually be a better option. You could set the prefetch to 1 to make sure the worker only consumes one message and when a worker starts building you could dispatch an event. You would only acknowledge the message after the build is done, in that case if the worker stops and a new one would start up it will just try the message again. Whenever something happend you could dispatch an event (for example BuildStarted, BuildFailed, BuildDone and BuildCancelled). These events would be consumed by another service which builds an representation of the current status. Also when you want to cancel the current build you could make sure that all messages for a certain job would end up at the same consumer (for example with a consisten hash exchange in rabbitmq).
I think the thread above is great, a point I want to call out is Messaging systems like Kafka are durable message queues in the sense that messages are persisted to disk. This actually could have been expanded on rather than going with the SQL Table approach. If this system is used by the entire company for every commit of every repository this won't scale well with a SQL Table as you'd have more writes than reads due to status updates. Messaging system would be more scalable not just with durability but also performance as you can partition on the hash of the repository and preserve order of the commits.
Thanks a lot Clem, we need more mock SD interviews just like this for different size companies and different apps
Will be interesting to hear the honest feedback of the interviewer after this discussion.
Learning some of the ideas in here helped me land a FAANG job. As a junior engineer I didn't think to use a SQL database to implement a queue for real systems at scale.
When we got to the part where we were trying to figure out a way to find out if a worker died, I was shouting "heartbeat mechanism" sitting in front of the screen, coz that was the first concept we learnt in our software architecture class. It was so satisfying when I heard you said the same haha. Great video and it does feel like a very real interview
There is a race condition in this design... simply selecting the next queued job does not give you a write lock on that row, so 2 workers could select it and try to update it, and they will both succeed, resulting in a duplicate parallel build on a second worker. Could add and 'AND status = "queued"' to the update clause to avoid this.
Tip: Don't reinvent the wheel, just use a queue service or job scheduler service. Experienced senior engineers should know when to implement something and when not to. I wouldn't hire this candidate if it was a senior engineer interview
Why do interviewers ask you to write merge ? When sort() is available in all the languages. Because interview is all about knowing if the candidate knows the basics well and if they can INVENT THE WHEEL if need be. You senior engineer would probably get stuck, if GCP / AWS stop their services for some god saken reason.
@@HallaBhalla If the question is to design a Queue service, then you implement it by using the basics. If question involves using a queue service then, you don't re implement the queue service. I said a senior engineer should know when to implement something and when not to.
First of all, thanks for making these available Clem. I seen a lot of videos about the systems design interview in preparation for my interviews and I'm a SystemsExpert customer, but I'm wondering if you could explain the difference between a novice , intermediate, and expert level performance. What criteria can be used to determine what level someone is and how able they are at designing good systems. Thanks!
I might have missed something, admittedly the video was pretty long, but you asked for feedback in the comments so here we go:
Given how in depth a lot of the solution was, it seemed that how the binaries get into the regional peer to peer networks was missing. You covered that GCS will magically replicate them for you, but then you jumped right into those binaries going from host to host. How does it get into the network? I am assuming that it would just be the first host to poll KVS and see a new state, then poll the network and nobody see nobody else has it so it goes to fetch from regional GCS.
Why not expand the enum in the jobs table so that you can have a single source of truth with regards to deployment? Something like: Queued, Building, Replicating, Deployable, Complete, Failed. Also with that DB, why not just use the last_h column as a phase_started time. Since you are still benchmarking against expected worst runtime, you can can keep your query basically the same but also track failures across the whole process (and not just building), without needing the heartbeat system for builders/workers.
I probably also would have asked a little more about the code that we are deploying. Is it only one codebase, or should this work for multiple applications? Do we only need to support one deployed version at a time? Especially if the answer to the second question is yes, then the whole goal state solution would likely need to be changed.
Overall great interview, I think the most important part was watching how you took your time and thought through everything slowly and deliberately while asking for clarification at basically every step
I'm not getting how the 10gb binary gets to all the regional GCS either. Or how p2p networks work. I would think a 10gb file going from country to country would take a while, especially at if it's done ad-hoc at the time of the app release? What if it's a 5TB binary? My guess from what was meant in this video is that regional host 1 (rh1) sends to host 2 (rh2), then rh1 and rh2 send to rh3 and rh4, incrementally/exponentially being able to send binary/replicate more at a time?
Alternatively, with an entirely different take: I heard that making incremental patches to source code as patches is very quick. So, why can't the version control system live in each region, and we just ensure they all have the latest commit. Then, each regional host just builds the binary prior to the scheduled release time of the app, and there wouldn't be a need to download a huge binary across countries.
There is a pretty glaring issue with this discussion. The discussion/solution proceeded as though there was only a single application, which was deployed to every server. Maybe that’s not absurd in its own right, but the problem states that there were around 5,000 new builds per day. That would mean this one application was updated every 15-20 seconds!
Clearly, a system that can deploy 5,000 different builds per day only makes sense if the system is supporting multiple different applications. And, providing such support would require a much more involved design than what was discussed.
Hi! Just to clarify, did he forget to add FOR UPDATE to SELECT and lock the job row before updating a status of it, so that some other worker that does the query at the same time will not take it as well?
that's what BEGIN TRANSACTION/COMMIT do. It enables the code inside that block to run as an unit, so nothing can happen between the SELECT and UPDATE queries
@@luisfguadagnin Yes, but next job will get this row and during the update we can have an error or row can be anyway updated by another job (depends of transaction type and database used). Writing FOR UPDATE will lock this row and make them invisible for another transactions.
Thanks for making this great video. While watching the video these two points came to my mind
1. Do we need an event system that generates an event whenever a new commit is made instead of the worker checking for new jobs?
2. Having a time limit for a job to run can solve the issue when the system goes down.
About the update on the RUNNING job when the node fails... if there's a network problem in the node and the hearbeat period expired, you update the job back to QUEUED, but that node might come back alive after another node picks up the re-queued job, finish it's job, and update the job to SUCCESS, and after a while, the new node that started re-processing the job, would also update the job. Not a huge problem, but might interfere with the build history.
Great job. I would like to see a feedback at the end of the interview, but it is just my humble opinion. Thanks
Thank you so much for this mock interview. This is one of the best video about system-design interview for starters to learn on what to expect in a design interview and how to respond to make the interview a success....
Fantastic video. Full of knowledge and very engaging. Thanks for this Clement!
One different approach for managing the workers could be to have just one service poll the SQL table (queue) and spawn new instances on demand (Spot instances, may be) to run the jobs. Rest of the architecture remains the same i.e the spawned instances still send heartbeat signals. This could prevent over provisioning of workers especially during times when no builds are run (code freeze) or very few builds are run. But it comes with an additional waiting time equivalent to that taken by a new instance to boot up when none of the available instances are available to take up a new job.
riigtht and mathemtiacally it may not be suitable
Very good content as usual. I'm doing all mock interviews available in SystemsExpert and I'm very impressed by how much effort you put on this.
Instead of using SQL database to backup queue and store worker servers health, couldn’t we use off shelf solution like SQS which is durable and handling queue dying. Then use Lambda to run the queue, which will handle the server dying as well as the concurrency problem.
I think "begin transaction" at ~27:39 does not really mean that select is concurent safe. I would rather use something like "select ... for update" statement (depends on db). I know he assume "serial sql job service" but it's not really prepared for pooling or horizontal scaling from the sql point of view.
yes. you need some kind of locking mechanism.
So I felt like you had an overly complicated solve for the monitoring for the workers. The problem parameters said that the deployment needs to be completed in 30 minutes. Figure out how long it takes to move built code to servers let's say 10 minutes, and allow 20 minutes for build. Record the last time status changed, and when the rows are RUNNING and status was set > 20 minutes ago, update status to FAILED:
Update StatusTable SET Status = "FAILED" where Status = "RUNNING" AND LastStatusUpdate > 20 minutes;
You're building queue functionality into af database, presumably to achieve multi region fail over. You'd have to use a serverless db, to avoid massive replication overhead for a single table, use a Running TTL column updated by the worker that picked up the job to set the TTL to 20 minutes later, the worker-pull query selects the latest job with TTL after now(). You need a worker-finished query to delete the record when the worker process COMPLETES.
I love your alg interviews but found the sys interview unsatisfying in terms of high level architecture, how the details of approaches used
Great scoping of the question tho!
Please explain which system can have 1000 deployments/day? Keeping in mind that one complete deployment is 30 mins.
We dont know if that solves the problem because the problem itself is not well defined. The time constraint depends more of the builder than that deploy sistem lol
thank you, i learned a lot and you really refreshed my SQL knowledge
1) in building code sub-system i think (38:30) you can check the heart beet when you enqueue the job so you don't need another aux workers ex ( select * from jobs where (heart_b < current_time -10m and status="running ") or ( status=="enqued" and create_at...) )
2) the problem in the deployment system is what if multiple engineers at small periods click on deploying a build ( each machine will stop and start downloading the build again)
i would love to hear your opinion
About engineers trying to deploy when there is already a deploy going on. Well the Build Subsystem could build the code and the Deploy Subsystem could distribute the build as a new release to all regions as usual. It's up to the engineers to decide if the ongoing deploy should be cancelled or not. New releases are asynchronously made, only the deploy to the production machine, in this scenario, is synchronous (containers could be used to isolate the application environment from its hosting machine, this way multiple releases could be deployed asynchronously)
For this interview I would use something I learnt at my masters: parallel scatter: Each node takes the code and pushes to production AND sends it to the nearest next node. So node 1 does the job and sends it to node 2. Now node 1 is free, node 2 is processing. That means node 1 can send it to node 3 and node 2 to node 4. Repeat this recurisve so step 1: 1 sends to 2. Step 2 1 sends to 3 2 sends to 4. Step 3: 1 sends to 5, 2 to 6, 3 to 7, 4 to 8. As we can see, we have a 2^n exponential speed, which is really fast. You REAALLY want to avoid first in first out :D, that is the slowest way
What are those nodes? The deploy regions? The target machines?
My solution:
In Build Subsystem each Worker could have its own Work Queue, a new component Work Scheduler comes in responsible to take new commits from source code repository, make a new build Work and push to most unloaded Work Queue. Periodically the Work Scheduler would check the Work Queues to balance the load and prevent Workers from being idle, moving Works between Work Queues. The Worker itself should be able to detect failures and recover build processes by implementing a Build State Machine for each Work and recording in which Build State the Work is actually (sub states of main RUNNING state). When the Worker is idle, it will try to begin working on oldest Work in RUNNING main state, if there is none (there would be only one or zero because the Worker works synchronously, but the Worker Scheduler may push RUNNING Works coming from dead Workers), it begins working on oldest Work that was not RUNNING, if there are RUNNING Works, it starts from the recorded Build State for that Work. This way the Worker recovers naturally from failures because it is agnostic to failures. But the Build State may be corrupted (lacking input data to begin working on), when this happen the Worker would try to playback the State Machine until it finds a Build State that can be run and begin working from there. It requires that each Build State be idempotent.
The Work ends when the Release (Version + Application Binary + Configuration) is stored in the Master Release Storage (a blob storage).
Now begins the Deploy Subsystem. The Deploy Manager would take new Releases from Master Release Storage and sync it to Release Storage of each Deploy Region. Each Target Machine is attached to one Deploy Region, it may have a client daemon running that listens to commands from its Deploy Region or the Deploy Region connects to it and run commands. When the sync ends, all Deploy Regions receives a "deploy" command from the Deploy Manager telling to deploy a given Release, the Deploy Regions would run Deploy Routines in its Target Machines to setup the environment based on the Release's Configuration and installs the Release's Application Binary.
@@heraldo623 The servers. They have basically 2 states: building code ( this is a local action) and deploying code ( send the built binary into the target system). Usually the send part is the slowest, depending on build size. So.. we maintain a list of closest neighbours like a graph (based on DNS) because that would be the fastest to send. There is a master server that receives the Queue and distributes the workload to each node. But each node will receive the entire Queue (slow at first step but will speed up by 2^node_number factor) using my previously described method. Recursively a node does the following: Receive the workload and it's assigned commit. Resolves the commit build, and sends the worload to the neighbour and the binary to the designated target
@@nagyzoli So a node receives the entire Work Queue, pops one Work, does the build, sends the remaining Work Queue to next nodes and deploy its build to known Target Machines. How each node knows which Work to pop from Work Queue? Because one node sends the remaining Work Queue to N >= 1 next nodes.
@@heraldo623 You divide the original queue into batches, you do not make it continuous. Let us assume we have 100 nodes, and a 1000 commit to build under an arbitrary timeframe, assuming 1 day. The master node sends the Q and the ID / sha of the commit to each node AND DOES NOT increases the Q for that day. If the number of build "slots" are filled you refuse new jobs till the batch is processed.
It is odd that Clément worked at Google because the deployment system he created has nothing to do with what is used at Google. (Contrary to what the interviewer suggested, I worked there) In fact, it would not be used anywhere. Simply put, it is very rarely acceptable to deploy to all of the machines at the same time. Any non-trivial application would have a non-zero startup time (often measured in minutes) and you don't want your service to go dark while the fleet is ramping up. You would use a gradual rollout, often with multiple stages, and have a dedicated system which manages the whole process.
I was going to buy SystemsExpert, but this and a couple of smaller misses gave me a pause. The video does give me some interesting ideas on how to communicate in a system design interview, though. Start with the birds-eye view, identify the blocks and then address them one by one. Be very methodical, very pedantic, very thorough. Go into obscene amount of detail. Check back with the interviewer regularly. Clarify the constraints in the beginning, even the ones that seem self-evident. I tend to jump quickly to the trickiest areas and bottlenecks, so this is quite helpful for me.
great presentation. looks like a totally real interview. guy on left is even little bit annoyed. very real
Do you hv to design everything from scratch, like the queue, why not pick something off the shelf directly?
In the regional key value store discussion, which you did, pick etcd or zookeeper, right?
Question. Why didn’t you try using a queue service or Kafka service in cloud instead of writing them to Sql database and then workers polling the db to start the build. Wouldn’t it be more efficient and simple when we using a AWS sqs or equivalent in gcp
Isn’t it better to first find the numbers before jumping in the design? Without knowing how many workers you have you cannot say if MySQL is a good option or not. Also instead of SQL, persistent queues I think would serve better this problem
Thank you for sharing insight into the process of system design. I think an important question relating to the regional swarms sharing new builds may have went unanswered. Using this design, how do you determine which node(s) download the builds initially from GCS in order for it to be shared via the p2p network while avoiding hundreds or even thousands of them simultaneously attempting that initial download?
I had the same thoughts. I think you could implement an election process to select 1 or more leaders who would be allowed to download from GCS. If the leader dies, the P2P network would then elect a new one.
Probably the only realistic System Design interview on whole of youtube
What application you are using for the drawing?
I think the overall solution is good although I don't love the polling all over the place. I think using webhooks or as another comment mentioned a message queue would be a great improvement.
Awesome. Just like the interviewer said, this architecture is very close to the one's being used in the real industry - just like in my company. I was so amazed you just thought of the design on the spot. Again, great content!
lol i mean this is definitely based on experience.
The last part of the build trigger P2p network can be handles by a consensus protocol like RAFT, most disributed consensus systems today (Consul, Kubernete, nosql DBs etc) are handling state in that way and sending commands to the rest of the workers to set the desired state.
Question: Clemente says at 41:40 that this design is super horizontally scalable. How is it super well scalable when we use a SQL database? I thought SQL databases scale bad horizontally.
SQL databases are not horizontally scalable in true sense. however, you are not really talking about volume of data here. a simple sql db can easily meet his requirements as data volume is pretty low.
Just a side note, for the SQL, if multiple workload executed at the exact same second, they may pick up the same "Queued Job ID" because the transaction will not Lock the Record until Commit.
Need to do it with pessimistic locking.
Is the queue + polling approach necessary for a company with infinite compute resources like google? Couldn't you just monitor utilization of the worker pool and dynamically scale it up to meet surge demand? This way, your "queue" microservice could just log the job state and propagate the build call on to the next available worker node. Would your score get docked for answering this way?
For the queue of jobs, isn't the SQL database of jobs the single point of failure here? Perhaps I missed something in this part, but this is the part where it seems like there would be a solution like a messaging queue with clustering or whatever way of replication.
Feels like you completely ignored “under 30 minutes” requirement. Am I missing something?
52:15
I have two questions:
1) How many times do we need to try to build a push?
2) What if multiple engineers pushes the button to deploy their code one after the other rewriting previous state of KVS which was not complete?
1) I would say - we should discuss with the interviewer on the Retry mechanism. That would be another point to chat about.
2) I would use Total Ordering algorithm.
1. How to cancel a deployment.
2. Could you talk about idempotency, if the user clicks the deploy button multiple times.
3. Seems like the second sub system was hurried up. Could have covered deployment lifecycle management in more details.
In what universe does a 10 GB binary build within 30 minutes, leave alone deploy across multiple locations.
google universe lol
A great example of a System Design interview, really looking forward to more this kind of videos
A message queue and a log file doesnt that make more sense, log files are cheaper and provide fast writes.
Instead of polling for the last bit we can potentially have kafka broadcast the signal out and then write to a table once all the systems have been successfully deployed and this can update the UI via SSE or something similar.