Facebook System Design Interview: Design an Analytics Platform (Metrics & Logging)

Поделиться
HTML-код
  • Опубликовано: 23 янв 2025

Комментарии • 79

  • @tryexponent
    @tryexponent  3 года назад +2

    Don't leave your system design interview to chance. Sign up for Exponent's system design interview course today: bit.ly/3K0lTtS

  • @yuanzhang3393
    @yuanzhang3393 3 года назад +31

    for the system needs to be in near real time, this is a clarify question to ask the interviewer instead of assuming.

  • @rituraj889
    @rituraj889 2 года назад +33

    I have a suggestion for all videos under this platform
    First they are super helpful when it comes how to carry out the whole design..like how to estimate or begin with
    But all of them lack cross questions from interviewee side..I mean in real life we can be bombarded with minute detail level questions
    Like in this video...how is data enrichment working or how come we are making data collection to be configurable without having whole business use cases
    Bottomline : make it more tougher :)

    • @cricket4671
      @cricket4671 Год назад +2

      Candidate doing what they showed in video would get downleveled in best case scenario 😅

  • @heterodyned
    @heterodyned 3 года назад +36

    Metrics and logging sound like two “separate” design questions 🤷🏻‍♂️

  • @mrsiddhu2012
    @mrsiddhu2012 2 года назад +31

    Great video - Kudos to the interviewer for making the environment so comfortable. Few things:
    1. I could not understand the choice of data base. Ideally this should be a combination of TimeSeries DB + Data warehouse?
    2. Certain key components/Aspects like rule engine ( for acting on events), Notification systems ( for notifying the interested subscribers) were missing?
    3. 90 days retention is a very less SLA. Down-sampling the data for lowering the volume and storing it for long term could have been discussed.
    4. I thought that the interviewer wanted to go beyond just visualization - To automated actions ( alarms etc.) and analytics too.

    • @pavananantharama3762
      @pavananantharama3762 2 года назад +6

      Good Observation. Felt like this interview is going in the wrong direction the moment a NoSQL DB was chosen.
      I would say time series DB or an Elastic search system would have been a good choice.
      The key takeaway for me was how well Hozefa communicated his thoughts and solutions. Very good communicatorr.

    • @schan263
      @schan263 2 года назад +3

      This interview is simply too short. At least need another 10 minutes for the design discussion.

  • @sivaranjansahu7427
    @sivaranjansahu7427 2 года назад +5

    Great video. Very natural and realistic, not like the rehearsed and phony ones like many other videos on YT.

  • @seenu007
    @seenu007 2 года назад +21

    Not satisfied with the discussion. Scope of the question is not clear. Are we building a system for analytics computed out of logging data or are we building a system that has logging and analytics as separate components?
    Interviewee could have discussed about:
    1. Grain of data that is sent by logging system. Is it individual events or aggregated counts?
    2. Database design that is optimized for analyzing time-series data
    3. Could have expanded machine-generated events and user-generated events and have different treatments on those datasets down the line.

  • @dicksonchibuzor7625
    @dicksonchibuzor7625 3 года назад +20

    Heard the interviewer made mention as part of the requirements that system could scale up to a billion users meaning events could be at least double of that (depending on the metrics that want to be tracked). I think in that case maybe a nosql wouldn't be the best persistent data storage decision. Maybe an OLAP kind of database (like clickhouse) should be used. This will definitely have a drastic positive impact in the query time for both visualization (retrieving) and inserting of events and will also help in creating way faster aggregates.
    Also another improvement that can be made with the design is maybe the queue can come after the validation/scrubbing service and not before. It could help save some space in the queues and not have them overwhelmed because only validated data will get into the queue and invalidated ones are discarded . Only when I see the queue should come before is if we are validating a very large batch of payload at a time then maybe we can stick with this design because validation might take some time for extremely large batches.

    • @implemented2
      @implemented2 3 года назад

      It will be beneficial to use a queue in case clients are the trusted ones and validation is not required or minimal. This applies to internal services, when you are building an infrastructure solution for internal usage within the company.

    • @opencompare
      @opencompare 2 года назад +2

      i would probably write all non-real time events straight to a data lake with high throughput. And later ingesting those using distributed data processing platforms like spark or Hadoop. Only gold or silver state data should be stored in DB for analytical purposes.

    • @jasonwakeman
      @jasonwakeman Год назад

      @@opencompare This exactly. a distributed schemaless db would require hundreds of instances just to handle the writes. to query this much data would take a lot of cpu and time. so, query the whole thing once per hour/day/etc and aggregate it into many tables of a relational db where the aggregates can be queried quickly/cached.

  • @RM-bg5cd
    @RM-bg5cd Год назад +3

    Event Sourcing and projections for visualization would have been amazing here

  • @bentchow
    @bentchow 2 года назад +3

    Introducing a Data Catalog would really help with managing PII and auditing where and how sensitive data is being used through data lineage.

  • @saip009
    @saip009 2 года назад +16

    Question to people proficient in designing backend systems - is this a good example of an interview or the design? I personally found this to be crossing out checkboxes in an interview. There isn't enough trade off discussions or building towards a solution. This seems like an inconsistent brain dump of a known solution.

    • @jasonwakeman
      @jasonwakeman Год назад +1

      I agree. Interviewer did a great job (real interviews wouldn't ask this broad of a question tbh), but i doubt this candidate would make it to next level. If the question was specifically about real-time user events, the answer might pass. but this is not a valid solution for big data. actual solution requires many services and multiple databases to aggregate the data for various use cases. Not one giant db which handles all writes and all queries. storage is cheap, so a solution with a single db doesn't really make sense for big data/analytics

    • @konradte
      @konradte Год назад +6

      from my experience this is not a real world interview. This was only drawing circles and rectangles without talking about data model, time series database, database schema, failure detection, monitoring. The candidate would be bombarded with questions right away. This is a "nice" design to draw but that doesn't take you through onsite with any serious company.

  • @schan263
    @schan263 2 года назад +8

    This interview is too short so there is not enough time to talk about some of the details. The design interview should be at least 40 minutes. The candidate only had 21 minutes. There's not enough time to do deep dive and it seemed rushed. There is not time to talk about scaling the individual components. Sampling is not scaling. The interview should be longer so that the candidate can talk about how many servers are needed, how much disk space required for X number of years/month, how many requests can be served per second, etc.
    The requirements list is too short. I feel we didn't spend enough time on the requirements.
    How would you determine the level of the candidate based on this interview performance?

  • @RituRaj-r5w
    @RituRaj-r5w 11 месяцев назад

    Its surprising that there was no discussion on OLAP storage solutions,since we will be analysing these metrics as end product

  • @rohitparthasarathy6671
    @rohitparthasarathy6671 Год назад

    I think for time series data we should be using RDBMS with Sharding or even better have the graph being generated from In memory DB.

  • @amazingabhay
    @amazingabhay Год назад

    whats the process of archiving looks like ? how/who gonna move data from main db to archive db and what would happen to precomputed visualisation data ?

  • @OneMillionDollars-tu9ur
    @OneMillionDollars-tu9ur 8 месяцев назад +4

    I am a system design interviewer and a hiring manager and I will probably give him a NO.

    • @OneMillionDollars-tu9ur
      @OneMillionDollars-tu9ur 8 месяцев назад

      Too many application level assuming, too few hard core technical details.

    • @OneMillionDollars-tu9ur
      @OneMillionDollars-tu9ur 8 месяцев назад +2

      And why does he talk about cache at all? No cache needed in this system

  • @riit1564
    @riit1564 2 года назад +3

    Feedback:
    1. should have talked more details about data storage and how the storage would support faster queries. Some sample queries as example must be shown and these queries are served.
    2. No mention of how logs are stored and indexed for faster search.
    3. Didn't justify the usage of queue?

    • @tryexponent
      @tryexponent  Год назад

      Hey Rahul, thanks for watching and leaving your feedback! Appreciate it!

  • @BuyCarsTVPakistan
    @BuyCarsTVPakistan 2 года назад +2

    Ok good interview with Imran Hashmi :p

  • @gufengmsa
    @gufengmsa Год назад +1

    The design is a little superficial . In the context of monitoring systems, the crucial 'dive deep' question pertains to data aggregation and the trade-offs between storage capacity and performance.
    The real world monitor system like cloudwatch and prometheus (push vs pull) have be mentioned during interview as well.

  • @sagarchoudhury56
    @sagarchoudhury56 2 года назад +9

    I think this interview will not fly. lots of flaws

  • @tapanparida3176
    @tapanparida3176 3 года назад +7

    very good... both interviewer and interviewee did excellent job... lot to learn from this video... i have an interview tomorrow with amazon, hope this helps....

    • @designpathy
      @designpathy 3 года назад

      how did it go ?, i have mine in second week of jan..

    • @eaf207
      @eaf207 3 года назад

      @@designpathy Good luck y'all. Can I chat with you I have one coming up soon.

    • @sachinmalik9574
      @sachinmalik9574 2 года назад

      @@designpathy how did your went

  • @AyoubMAZA
    @AyoubMAZA 6 месяцев назад +3

    This guy interviewed himself

  • @KrishnaG0902
    @KrishnaG0902 2 года назад +2

    This could be a case for Kafka for message processing queue with event driven API in mind..

  • @CommanderShepard05
    @CommanderShepard05 2 года назад +3

    dear team, please provide the name of the tool that the user is using for drawing the architecture

  • @AnushkaVijay-cv7tk
    @AnushkaVijay-cv7tk Год назад

    does meta ask system design question to sde1 role

    • @tryexponent
      @tryexponent  Год назад

      Hey AnushkaVijay-cv7tk! Typically SDE1 candidates will not be asked system design questions

  • @downshiftturbo8974
    @downshiftturbo8974 8 месяцев назад

    Low latency as a NFR didn't make sense to me. Nothing on priority or transactional data like money is involved. This is something passive and it will be used later to make business decisions

  • @psychoprincess8920
    @psychoprincess8920 9 месяцев назад

    Small correction at 6:00:
    For money/banking system, consistency should be more prioritized over availability.

  • @neerajkhanna3024
    @neerajkhanna3024 2 года назад +4

    Why was visualization a big piece of the discussion. Design was metrics and logging, which lacked depth. It's whole blob of logging data coming, could be stored in timeseries DB or even object store like S3 then moved to DW like Redshift. Why NOSql DB needed in this case.

    • @jasonwakeman
      @jasonwakeman Год назад +1

      not sure where you get the idea that it is a whole blob? imagine a webapp: you would want to be logging individual events so that if browser is closed you don't lose any. s3 would work but is not best choice: imagine having a lambda for every single user event that wrote to s3

  • @amitkumarsrivastava9261
    @amitkumarsrivastava9261 Год назад +3

    NoSQL DB for a time series data. What a Joke!!! Can't believe FB EM giving this sort of design

  • @arunsatyarth9097
    @arunsatyarth9097 3 года назад +10

    Not in depth at all

  • @Lays416TL
    @Lays416TL 3 года назад +5

    Load balancer is redundant if you're using a queue. Events should be published to the queue right away and available consumers (validation service) will handle events as they become available.

    • @gsb22
      @gsb22 3 года назад +11

      I believe you dont expose the queue directly and it has to sit behind a service which actually pushes the data onto the queue. And since this service, needs to scale up and down, we should need LBs in front of front end servers.

    • @KevindraSingh
      @KevindraSingh 3 года назад +1

      Yup exposing an implementation detail like queue directly to the client will hurt the system in the long term when there comes a requirement to modify the design.

    • @dicksonchibuzor7625
      @dicksonchibuzor7625 3 года назад +1

      You shouldn't expose the queue directly to the event payload also it will help with "Load balancing"😃 especially for the scale the interviewer mentioned (definitely should have multiple queues ) .

    • @japanboy31415
      @japanboy31415 2 года назад

      wrong.

    • @jcaliz
      @jcaliz 2 года назад +1

      Queues usually benefits of having fast protocols like TCP and UDP (in case you don't care about data loss), exposing these protocols to the end user is not safety.

  • @kartech11
    @kartech11 2 года назад

    Can a load balancer directly insert to a queue?

    • @rituraj889
      @rituraj889 2 года назад

      Yeah good point..isnt LB by defailt part of MQ
      I mean number of partitions or consumers can do the same thing

  • @t3ntube357
    @t3ntube357 3 года назад

    may I know the tool name they used?

  • @ZealousSwede
    @ZealousSwede 3 года назад +3

    Hozefa is a beast!!!!

  • @karthikr5884
    @karthikr5884 3 года назад

    Nice:)

  • @jjc5258
    @jjc5258 2 года назад +15

    This interview gonna fail, bad example

    • @ramgamery
      @ramgamery 11 месяцев назад

      Why? Can you please explain?

    • @ashwin81088
      @ashwin81088 7 месяцев назад +2

      This interview went well imo. The system he described is what we use in my org. They took 3 years to develop but our boss designed it in 20 mins.

  • @theja63
    @theja63 Месяц назад +1

    Every single one of these exponent interviews are super shallow.

  • @iamworstgamer
    @iamworstgamer 2 года назад

    18:32 this question had no clear answer given

  • @fevicoI
    @fevicoI 4 месяца назад

    This is not a great solution. It leaves a lot of technical bits, unnecessarily assumes a lot of things. Not the right way.

  • @HEKTO3
    @HEKTO3 2 года назад +1

    Not very successful interview

  • @wuaaron662
    @wuaaron662 3 года назад +8

    uhm...uhm...uhm...uhm...uhm...
    ....................................................is it how it works in real system???????

    • @tsjoshi
      @tsjoshi 3 года назад

      "What if..." that's what happens all the time in reality.

  • @jantrollan3358
    @jantrollan3358 4 месяца назад +1

    How can you stay focused during the interview when the interviewer is so attractive?

  • @gsb22
    @gsb22 3 года назад +4

    17:10
    WOW. I mean there were nit picks before this point but this is a big NO. The analytics platform HAS to save each and every event no matter what. It doesn't matter if this is being used by 1 user or trillions users, you have to store each and every event. The response to the scale problem, would be to scale out queue and ingestion service as the number of event increases.

    • @gsriram7
      @gsriram7 3 года назад +3

      @@joed5714 Actually its not. We heavily use sampling to keep up with upstream and its a standard practice when it comes to exorbitantly high (like 10000+ B events per second). We have a data pipeline to ingest packet header from all the routers. A router can process 5 Gbps and there are 1000+ routers and it is impossible to ingest all those events without sampling. Ofcourse unless you provision 10000+ 32 core instances

    • @nagoorshaik8025
      @nagoorshaik8025 3 года назад +4

      I think whether it is a BIG NO or absolutely YES is to be decided on use case. Depend on the metrics and the purpose we collect this data it might not necessary to collect metrics from every use. Sampling important statistical method that gives expected results with out going through each and every input. I know we have tools/methods and frameworks to be able collect each input with out a miss, but again do we need to do this or not has to be decided first otherwise you are jumping in to a solve a problem which doesn't exist.

    • @andrew3
      @andrew3 2 года назад +2

      Sampling is done by all the major tech players for large applications. It is completely valid to suggest.

    • @daryaarbuzova3315
      @daryaarbuzova3315 Год назад

      Stopped the video at the same timestamp to process what was said :\ Agree that it's a big NO. For example, sampling user conversions for ads analytics is not acceptable.

  • @craigslist1323
    @craigslist1323 2 года назад +5

    How is this guy a manager. Probably will fail intern interviews.

  • @yashmishra3900
    @yashmishra3900 3 года назад +3

    Who is the interviewer pls include her LinkedIn ID