CRDTs - Stop Worrying About Write Conflicts | Systems Design 0 to 1 with Ex-Google SWE

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 64

  • @rcw7381
    @rcw7381 Год назад +13

    best CRDT video so far IMO

  • @MrPaulo8394
    @MrPaulo8394 2 месяца назад +1

    Thanks for all these videos. I work as an SA at MongoDB and I find your videos quite helpful to simplify all concepts found in the book “Designing data intensive applications”

  • @0xhhhhff
    @0xhhhhff 11 месяцев назад +11

    Youre funny as hell and you explain quite well. Hope uoir channel grows big

  • @AlexBlack-xz8hp
    @AlexBlack-xz8hp 7 месяцев назад +3

    Super good video! Thank you so much. Really enjoying the whole series.

  • @jporritt
    @jporritt 4 месяца назад +3

    For the counter it may have been worth explicitly covering the merge function, even though it seems to be something like: merge((x1, y1), (x2, y2)) = (max(x1, x2), max(y1, y2))

  • @benshapiro9731
    @benshapiro9731 Год назад +20

    “I only last a minute”

  • @themichaelw
    @themichaelw 5 месяцев назад +3

    Jordan with the heat in the first 30 seconds

  • @hoangnhatpham8076
    @hoangnhatpham8076 Год назад +4

    Thanks for the video! Very clearly explained.

  • @user-sx4wm5ls5q
    @user-sx4wm5ls5q Месяц назад +1

    thanks for the super informative video. One quick question, how can a counter be idempotent??

    • @jordanhasnolife5163
      @jordanhasnolife5163  Месяц назад

      Well I guess an idempotent counter would be a set of "increment" operations each with its own unique ID. That being said, what I moreso meant here is that the CRDT itself is state based, as opposed to operation based, in which you send the entirety of a CRDT to other nodes, rather than just an individual operation on it. Doing that is idempotent.

    • @user-sx4wm5ls5q
      @user-sx4wm5ls5q Месяц назад

      @@jordanhasnolife5163 Ahh got it thanks!

  • @idobleicher
    @idobleicher Год назад +3

    I just love your videos man

  • @soumik76
    @soumik76 10 месяцев назад +3

    2 questions:
    1. From what I understood CRDTs are handy for eventual consistent systems. So for example, if I am dealing with Whatsapp group joins/leaves, the set of groups in which I am will be evntually consistent, so CRDT won't be a good way to propagate this info (thus leaderless replication may not be a good choice)
    2. Is this all implemented by leaderless DB (like Cassandra) under the hood? Or does this involve someone to define the merge functions for the tables? Or is the merge function auto-detected by the DB Engine based on the type of data (counters, cart etc)?

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад

      1) Yep!
      2) Riak does implement these for example, but I think each DB supports them to varying degrees. Generally, the point is that you don't have to write any merge function, unless you're making your own custom database of course :)

    • @soumik76
      @soumik76 10 месяцев назад

      @@jordanhasnolife5163 thank you

  • @dibll
    @dibll 11 месяцев назад +4

    Jordan, not sure I got the distinction between vector clocks and CRDTs. CRDTs uses vector clocks underneath anyway. Is the difference between them is that CRDTS do the merge for us but with vector clocks we ask client to do the same? Could you pls help me understand? Thanks

    • @jordanhasnolife5163
      @jordanhasnolife5163  11 месяцев назад

      Grow only counters are basically a vector clock.
      Set CRDTs are more or less the same idea, but don't use an underlying version vector. Basically it just avoid having to implement the logic yourself

  • @mohittheanand
    @mohittheanand 6 месяцев назад +1

    awesome video Jordan. btw why do we need to have 2 lists (inc, dec) can't we just decrement in the inc list itself?

    • @jordanhasnolife5163
      @jordanhasnolife5163  6 месяцев назад +3

      Let's imagine I have three nodes, A, B, C.
      Each get incremented once, and they all sync up, so now each have [1, 1, 1].
      Now let's say I decrement A and that makes it to B, but A goes down before we synchronize to C.
      So B now has [0, 1, 1]
      C has [1, 1, 1]
      What's the correct count? We can't reference the number of increments/decrements on A since it's not up anymore.
      How do we merge the state of B and C? If we have separate increment and decrement counts this is easy.

  • @atanumondal8078
    @atanumondal8078 Год назад +1

    Hi Jordan,
    Thanks a ton for the awesome videos. One request. Is there any way to share the notes/slides

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад

      Yeah fwiw pretty much all of this content is in my old slides, so just check the channel description

  • @SambitMallickiitg
    @SambitMallickiitg Месяц назад +1

    How can we leverage CRDT for a flash sale campaign of a product with defined stock count?

    • @jordanhasnolife5163
      @jordanhasnolife5163  24 дня назад +1

      You can use a counter CRDT and stop selling when it reaches the count. You're going to oversell though since things are eventually consistent here.

    • @SambitMallickiitg
      @SambitMallickiitg 24 дня назад +1

      @@jordanhasnolife5163 Thanks for clarifying. I see that you have already explained this in Operational CRDT. I noticed a few HLDs where people use a counter CRDT without much detail on consistency.

  • @msebrahim-007
    @msebrahim-007 5 месяцев назад +1

    Question about adding elements after they have been removed (14:04):
    If a user adds "ham" 5 times to the set on the same node, what is preventing the set from containing different 5 instances of "ham" with unique IDs?

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 месяцев назад

      Nothing. You have multiple instances of ham now.
      On the front end though, we just tell the user that we have one instance of ham.

    • @twin392
      @twin392 5 месяцев назад +1

      If we were to remove "ham" where there's 5 add instances, we would publish 5 tombstones, one for each instance, right?
      And I guess this begs the question, in an eventually consistent system, is it possible to miss publishing a tombstone in a state-based CRDT set because a leader doesn't have one of the adds yet?
      BTW, first time commenting on the channel, absolutely love the content and thanks for making it!

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 месяцев назад

      @@twin392 Yeah absolutely possible your db leader wouldn't have all of those available yet to delete all instances - but in some senses that's a feature because then it means that someone else probably did an independent add operation of all of the ones that you just attempted to delete.

  • @tranquilitybase293
    @tranquilitybase293 7 месяцев назад +1

    Excellent video. One question: what hardware are you using to write the notes?

  • @LegitGamer2345
    @LegitGamer2345 3 месяца назад +2

    Hey Jordan, I get how state based CRDTs are idempotent, but I'm having a hard time understanding how they solve causal writes, do they or do they not?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад

      A database doesn't reflect causality when it displays one write that doesn't rely on another write. If I'm making a write to a state based CRDT and it is dependent on another state based CRDT, clearly the resulting state from my write will include the causal state that I wrote based off of. Hence when I send that state around, causality is preserved.

    • @LegitGamer2345
      @LegitGamer2345 3 месяца назад +1

      ​@@jordanhasnolife5163Will the resulting state from your write eventually include the causal state or will it instantly include the causal state? My understanding is that it will be eventual, and if it's eventual, there's no stopping a reader to read from this state that does not have the causal state yet, so causal writes still exist? hope I made sense

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад +1

      @@LegitGamer2345 In a state based CRDT it should be instant. I need your state to make my write, and the resulting state from my write contains your write.

  • @Kuma117
    @Kuma117 5 месяцев назад +2

    Holy crap that intro, first 4 seconds I burst out laughing XD

  • @shobhitarya1637
    @shobhitarya1637 6 месяцев назад +1

    How operational CRDT has downside shown in video (in DB2 remove propagate first before adding ham). Because these replication is done by logical replication logs which has ordered logs...so how come it can be non-ordered while propagating replication?

    • @jordanhasnolife5163
      @jordanhasnolife5163  6 месяцев назад

      If you don't even want to think about ordering for a second, think about idempotence. If I send a message which I think didn't go through but actually did, and then send it again, now I've sent that update twice. This isn't a problem with state based crdts.

  • @zuowang5185
    @zuowang5185 10 месяцев назад +1

    what is the technical difficulty for git to unable to automatically merge two conflicting commits? why would the same difficulty not apply in CRDT based merge?

    • @zuowang5185
      @zuowang5185 10 месяцев назад +1

      for example, if I edit a google doc during a flight without wifi, and someone else also made a large conflicting change during that time

    • @zuowang5185
      @zuowang5185 10 месяцев назад +1

      and could you talk about down side of using CRDT besides the complexity to set it up

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 месяцев назад +2

      There are many types of CRDTs. Merging counters is simple. That being said, if you want a text merging CRDT, complications arise. We can talk about that one in a few videos :)

  • @forrestallison1879
    @forrestallison1879 3 месяца назад +1

    which video is the sequel to this?

  • @HSBTechYT
    @HSBTechYT 8 месяцев назад +1

    okay I am a fan

  • @hackerandpainter
    @hackerandpainter 8 месяцев назад +1

    you are a genius.

  • @John-nhoJ
    @John-nhoJ Год назад +1

    @jordanhasnolife5163 does every record have an update-vector?

  • @SurajSharma-ur6rm
    @SurajSharma-ur6rm 7 месяцев назад +1

    Can anybody please help me understand difference between version vectors and state based CRDTs?

    • @jordanhasnolife5163
      @jordanhasnolife5163  7 месяцев назад +2

      A version vector helps us order writes. A state based CRDT is some data that lives on each leader that can easily be merged together so that it is eventually consistent. It just so happens that the implementation of a version vector and a state based counter crdt are the same.

    • @SurajSharma-ur6rm
      @SurajSharma-ur6rm 6 месяцев назад

      @@jordanhasnolife5163 thanks.

    • @sahilguleria6976
      @sahilguleria6976 3 месяца назад +2

      Version Vectors are used to keep track of the version of data in distributed systems. They help in detecting conflicts and determining the causality of updates. Here’s how they work:
      Vector Clock Structure: Each replica in a distributed system maintains a vector of counters, with one counter for each replica.
      Update Tracking: When a replica updates its data, it increments its own counter in the vector. This updated vector is then propagated with the data.
      Conflict Detection: When replicas exchange data, they compare their version vectors. If one version vector has all counters greater than or equal to another, it means it is a more recent version (or causally after). If the vectors are not comparable (some counters are higher and others are lower), it indicates a conflict.
      Resolution: Conflicts are usually resolved by application-specific logic, often requiring manual or semi-automatic intervention.
      State-Based CRDTs are a type of CRDT designed for achieving strong eventual consistency in distributed systems without requiring complex conflict resolution. Here’s how they work:
      State Representation: Each replica maintains a state representing the data structure.
      Merging States: States can be merged using a deterministic and associative function, ensuring that merging the states in any order will produce the same result.
      Propagation: Each replica periodically sends its state to other replicas. When a replica receives a state from another replica, it merges it with its own state.
      Convergence: Because of the properties of the merge function, all replicas will eventually converge to the same state, given enough time and the assumption of reliable communication.

  • @unsaturated8482
    @unsaturated8482 4 месяца назад +1

    damn fire

  • @innazhogova3621
    @innazhogova3621 3 месяца назад +2

    proud to be one of the 3%

  • @vikramsaurabh8240
    @vikramsaurabh8240 11 месяцев назад +1

    gosh...these intros😂

  • @zozoTravels
    @zozoTravels 3 месяца назад +1

    huh huh, pretty funny, you last only one minute, wont really satisfy the 3 percent female watchers. just kidding :} great stuff

  • @davidarcoleo6033
    @davidarcoleo6033 7 месяцев назад +1

    lol