Это видео недоступно.
Сожалеем об этом.

How

Поделиться
HTML-код
  • Опубликовано: 15 авг 2024
  • System Design for SDE-2 and above: arpitbhayani.m...
    System Design for Beginners: arpitbhayani.m...
    Redis Internals: arpitbhayani.m...
    Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
    Sign up and get 40% off - app.codecrafte...
    In the video, I discussed the importance of maintaining search at scale using Elasticsearch at Twitter. Twitter built tooling around Elasticsearch to handle surges in search traffic, real-time ingestion, and backfill. The course on system design focuses on building intuition and covers real-world system design scenarios. Twitter's tooling includes an Elasticsearch proxy for standardization and a backfill service for staggered data ingestion. By deferring rights and ensuring synchronous reads, Twitter maintains stability and scalability in its Elasticsearch clusters.
    Recommended videos and playlists
    If you liked this video, you will find the following videos and playlists helpful
    System Design: • PostgreSQL connection ...
    Designing Microservices: • Advantages of adopting...
    Database Engineering: • How nested loop, hash,...
    Concurrency In-depth: • How to write efficient...
    Research paper dissections: • The Google File System...
    Outage Dissections: • Dissecting GitHub Outa...
    Hash Table Internals: • Internal Structure of ...
    Bittorrent Internals: • Introduction to BitTor...
    Things you will find amusing
    Knowledge Base: arpitbhayani.m...
    Bookshelf: arpitbhayani.m...
    Papershelf: arpitbhayani.m...
    Other socials
    I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
    LinkedIn: / arpitbhayani
    Twitter: / arpit_bhayani
    Weekly Newsletter: arpit.substack...
    Thank you for watching and supporting! it means a ton.
    I am on a mission to bring out the best engineering stories from around the world and make you all fall in
    love with engineering. If you resonate with this then follow along, I always keep it no-fluff.

Комментарии • 45

  • @baibhabmondal1740
    @baibhabmondal1740 Год назад +6

    Thanks Arpit, this helps in drawing parallels to other systems as well. And its so nice to see the fundamentals are quite the same in handling large scale infra

  • @abhishekvishwakarma9045
    @abhishekvishwakarma9045 Год назад +4

    such a great explanation, learned a lot from you arpit sir 😎, keep going 🔥

  • @ShivamKumar-bt9nn
    @ShivamKumar-bt9nn Год назад +1

    Love these stories of great engineering. Request to please bring these more often. Thanks a lot 🙂

  • @navneetkalra3934
    @navneetkalra3934 4 месяца назад

    Similar to what we built at Oracle...Oracle Knowledge AI search is similar kind of architecture.we have also introduced vector searches in elastic search

  • @architshukla8076
    @architshukla8076 Год назад +1

    Thanks Arpit for such an informative content

  • @guitar-nation-gautam
    @guitar-nation-gautam Год назад +2

    But I guess if the read operation is an I/O intensive one , like fetching a yearly orders report from ES it shouldn't be a synchronous operation , rather it should follow the write flow described by you i.e send the report meta details as an event to kafka topics and later workers can mail them the reports asynchronously.

    • @sharemomentsindian
      @sharemomentsindian Год назад

      Here also if report is big , how we fetch it can be discussed

    • @meditationdanny701
      @meditationdanny701 Год назад

      But why use ES for analytical queries

    • @AbhishekTiwari-uy7of
      @AbhishekTiwari-uy7of Год назад

      better to directly run some spark job on s3/hdfs and refrain from using elasticsearch for such use cases.

  • @biswajit-k
    @biswajit-k 9 месяцев назад

    Very helpful! Thanks a lot sir.

  • @manjunathyaji7316
    @manjunathyaji7316 Год назад

    Thanks Arpit! This was a great video! I had a question.
    In the backfill process, how does the orchestrator know how many workers to spawn? How do you monitor and calculate the amount of data yet to be processed in HDFS?
    If Kafka was used instead of HDFS, I know there's a way to calculate the consumer lag, which can be used to trigger orchestrator's rules.

  • @abhiyanker
    @abhiyanker Год назад +1

    Why was HDFS used here? A simple queue(like SQS) or a Kafka if Twitter wanted to have a retry mechanism would have achieved the same.

    • @AsliEngineering
      @AsliEngineering  Год назад

      Staging storage for subsequent consumption.

    • @abhiyanker
      @abhiyanker Год назад

      @@AsliEngineering Thanks for the reply! I was not hoping I would get a reply here.
      When backfill is not required Twitter is putting it in elastic search directly and for backfill they are putting it in HDFS. I think the reason would be the constraint of memory in Kafka or SQS. S3 or HDFS do not have that.

  • @shishirchaurasiya7374
    @shishirchaurasiya7374 Год назад

    This was really a crispy one

  • @random4573
    @random4573 Год назад

    In database systems we can segregate write and read across different DB and eventually make read node consistent with write node data.
    I dont much about ES. But was it not an option in ES.

  • @chetan_bommu
    @chetan_bommu Год назад

    Thanks for the great explanation. I have a basic question. What is backfill & it's job here?
    Is it about parsing each tweet and doing analysis?

    • @uditchaudhary9117
      @uditchaudhary9117 Год назад

      Backfilling updates the index with the latest data crawled from various sources.

  • @baibhabmondal1740
    @baibhabmondal1740 Год назад

    Any HighLevel folks watching this,
    It would be very similar to our eventing (& mongo-indexing) service, and the backfill is basically our snapshot service.

  • @dharins1636
    @dharins1636 Год назад

    Hey Arpit, I am confused, initially you said every team had their own cluster, is the proxy a common service for all the clients of different cluster or each cluster will have its own proxy service?

    • @AsliEngineering
      @AsliEngineering  Год назад +1

      Hybrid setup is a possibility.
      There may be services that have an isolated proxy where there may be a few who share .

  • @sureshchaudhari4465
    @sureshchaudhari4465 Год назад

    Bahut acche

  • @atulyadav21j
    @atulyadav21j Год назад

    Thanks Arpit for making this video!
    I had some follow up, curious question
    - what happens when there is too much data on kafka during backpressure while indexing ?
    - can map reduce create an elastic-search understandable file, which can be be used for bulk insertion ? Since in current architecture worker will be again making 1:1 calls.

  • @kpicsoffice4246
    @kpicsoffice4246 Год назад

    Great video dude.
    I wanted to ask you about your recording setup. Are you using obs & screen mirroring your iPad or something? Please mention any hardware/software you need for these videos

    • @AsliEngineering
      @AsliEngineering  Год назад +1

      Obs plus iPad. Nothing more.

    • @kpicsoffice4246
      @kpicsoffice4246 Год назад

      @@AsliEngineering I see. So is it iPad that you screen mirror + obs on MacBook? And is the app Notability? Btw your handwriting is awesome!!

  • @gomathivigneshmurugan5409
    @gomathivigneshmurugan5409 Год назад

    Hi arpit, Thanks for your videos, sorry if my question is stupid, I have seen this video and your bookmyshow video also, in both always scaling happens during write opertion only, what about huge no of traffic reads a particular API how api stability is ensured, kindly revert please...

  • @user-ot3ro8zc6x
    @user-ot3ro8zc6x Год назад

    Since write is happening in async that particular tweet wouldn't reflect in his tweets immediately right?? so how will the user immediately sees his tweet??

    • @AsliEngineering
      @AsliEngineering  Год назад

      How likely is the user going to search his/her own tweet immediately after posting it?

    • @user-ot3ro8zc6x
      @user-ot3ro8zc6x Год назад

      @@AsliEngineering how to handle such a use case if there's any

    • @AsliEngineering
      @AsliEngineering  Год назад +1

      @@user-ot3ro8zc6x search systems are never designed to be strongly consistent.
      But if you want strong consistency then your API will have to synchronously write to DB and to Search engine. a massive overkill tbh.

    • @user-ot3ro8zc6x
      @user-ot3ro8zc6x Год назад

      @@AsliEngineering Yeah got it.

  • @rohit_starker34
    @rohit_starker34 Год назад

    Hi sir I'm 1st year student should I buy your system design course

    • @AsliEngineering
      @AsliEngineering  Год назад +2

      Not at all. Meant for more than 2 years of work experience.

  • @tesla1772
    @tesla1772 Год назад

    Why dont api server directly write to kafka instead of proxy

    • @AsliEngineering
      @AsliEngineering  Год назад +3

      Because it was a system rewrite and they did not want to change any upstream.

    • @baibhabmondal1740
      @baibhabmondal1740 Год назад +2

      Also to add onto Arpit, I would assume the proxy still would have authority over rate of requests, and some kind of auth. In case of strange burst, we could avoid pushing a lot of unwanted data to Kafka.

  • @AadiManchekar
    @AadiManchekar Год назад

    what if kafka gets too many messages??? will it drop some messages>

    • @AsliEngineering
      @AsliEngineering  Год назад +1

      Back pressure.

    • @dharins1636
      @dharins1636 Год назад +2

      No, the beauty of kafka is its log-append, it will add it and you just have to consume, then you can configure the topics to delete the "older" data based on the configuration (bytes or time or both). Ofcourse there are compacted topics but thats another type of "reducing" the data space (and it has its own problems :) )

    • @AadiManchekar
      @AadiManchekar Год назад

      @@dharins1636 thankyou

  • @sharemomentsindian
    @sharemomentsindian Год назад

    Are worker nodes spark jobs which are streaming from Kafka and writing in elastic search at particular window or interval @arpit @asliengineering

    • @AsliEngineering
      @AsliEngineering  Год назад

      Could be. Implementation can be anything. Raw consumers, or Spark jobs.