Это видео недоступно.

Сожалеем об этом.

Distributed Systems 7.1: Two-phase commit

Martin Kleppmann

Просмотров 62 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 авг 2024
Accompanying lecture notes: www.cl.cam.ac....
Full lecture series: • Distributed Systems le...
This video is part of an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge. It is preceded by an 8-lecture course on concurrent systems for which videos are not publicly available, but slides can be found on the course web page: www.cl.cam.ac....

Комментарии • 55

@Alvaro-hm9vu 2 года назад ⁺¹⁸
Got a job because of you... you changed my life... thank you
3 года назад ⁺³⁵
As soon as I find enough time I'm going to go through all the series. Thank you for making the effort.
@krizh289 3 месяца назад ⁺¹
Thanks for putting these lectures on youtube--education should be accessible to all
@ahmetb 3 года назад ⁺⁶
I was reading your book and got tired at the beginning of chapter 8 then I found your RUclips channel while trying to watch some videos before I dig into the chapter! Thanks for all your work in making this field more understandable.
@IrvinHerreraGarza Год назад ⁺³
Mr. Kleppmann , I love your book and the way you explain things in your videos. Thank you so much for creating this material.
@zhou7yuan 2 года назад ⁺⁶
"Consistency" [0:11]
ACID
Read-after-write-consistency (lecture 5)
Replication
Consistency model
Distributed transactions [2:26]
Atomic commit versus consensus [4:47]
>1 propose | all votes
any 1 proposed value decided | must all commit/abort
crash tolerated | abort if 1 node crash
Two-phase commit (2PC) [6:33]
(key moment) [9:45]
The coordinator in two-phase commit [10:25]
Fault-tolerant two-phase commit (1/2) [12:58]
Fault-tolerant two-phase commit (2/2) [16:43]
@nakonachev1407 2 года назад ⁺³
Great lecture, straight to the point. Thanks for the effort put into it and the adequate way of explaining it.
@sachin_getsgoin 3 года назад ⁺²
Delighted to watch the series. Thanks for creating this. I am already grateful to you because of "DDIA"
@thewolfer2281 2 года назад ⁺²
Legend!! Im passing this course cuz of this playlist, the whole distributed systems in 1 day thanks to you
@iyadelwy1500 2 года назад
Bas yala ya abdo
@thewolfer2281 2 года назад
@@iyadelwy1500 😂😂😂 walahy sebtaha 3shan enta tshofha
@mikedelta658 Год назад
Crystal clear explanation. Hats off to you, Martin!
@timurlanrahimberdiev6096 2 года назад ⁺²
Great lectures, Great book, Great author 👍
@lifeirao7605 3 года назад ⁺⁵
super illustrative. Thank you!
@2tce 2 года назад ⁺¹
@martin Kleppmann, thanks for the interesting presentation all the way from Cambridge. I'll like to suggest that we could update the Linearizable CAS to:
IF old = new THEN
success := true
There is no point comparing the old and new, if they are the same. :)
@paulchicos1872 2 года назад
you my guy are a gem of humanity
@user-vu5dl8mj2m 2 года назад
Grateful for the amazing lecture! Finally, get some impression about how Raft works.
@kobew1351 4 месяца назад
Hope you can make a video to explain three phase commit and how it improves fault tolerance.
@veerajbhokre1847 7 месяцев назад
Amazing lectures. Thank you so much. You are a god.
@manishsakariya4595 2 года назад
Very nice and detailed video. I would love to see your three-phase commit explanation,
@zaixrx 29 дней назад
Big thanks
@martinkunev9911 3 года назад ⁺³
What does the failed replica do when it comes up?
@jeniamtl6950 10 месяцев назад
atomic commitment is completely different from atomic in ACID. For example, if students and classes are handles on different nodes, then after all components have voted yes and the coordinator send the commit messages, there will be a moment when the student has enrolled in a class but the class does not yet exist or vise versa. This is completely different from "atomic" in ACID.
@rastaeule7482 3 года назад
Very clear explanation!
@andreip9378 27 дней назад
Wow, I didn't know Martin has a YT channel. Instance subscribe.
@abcdef-fo1tf Год назад
Am I right in understanding that we can use raft to send total order broadcasts and elect new coordinators for node communication and two phase commit for commiting data?
@OffAndGo Год назад
Hello, the video is so helpful but hope that my question can be clarified, best.
Does the coordinator node care if other nodes have committed successfully or not, if it does and a node failed to commit, does the coordinator make a second decision for sending an abort to all the nodes?
@tanmaymehrotra86 2 года назад
what if is nodes reply to co ordinator that yes we can peform this transaction and send out the ok message (in response to prepare) but after sending the prepare they crash ? I assume these nodes will replicate the data (via consensus) so even in the face of faliure another leader will get elected. I do understand how total order broadcast work via raft but I am unable to how data is locked ?
@ivan.p 3 года назад
Good explanation! Thank you!
@BHARATKUMAR-le6eq 2 года назад
Hi Martin, you told failure detector can be run on any node. So my doubts are what will happen if the specific node is down or crashed on which failure detector is running?? and then how we will detect how many other nodes also crash??
@danish6192 Год назад
Why client is opening transaction simultaneously on 2 nodes in 2PC ? shouldn't the transaction be open on master node only ?
@za406 Год назад
Question: Why is the "prepare" message necessary if replicas "ack" on the original transaction message?
@zuggrr 2 года назад
This is fantastic ! thank you so much :)
@complicated2359 Год назад
If database gone down after it had agreed to commit, what would you do?
@yihanwu3823 2 года назад
Fault tolerant 2PC means the coordinator is redundant and can be removed?
@yuchen6630 Год назад
thank you
@jainamm5307 4 месяца назад
What happens if one of the nodes has sent ok for prepare but while waiting for all the oks it crashes ? The transaction will go forward in all the other nodes.
@jainamm5307 4 месяца назад
One potential solution to this problem is to have a recovery mechanism for the node when it comes back up.
@jainamm5307 4 месяца назад
One potential solution is to have a recovery mechanism for the node when it comes back up.
@albumlist1 2 года назад
Hi Martin, Thanks for this amazing series. I have a question here . If for any replica there are conflicting answers (one sent by the replica itself and other sent by other node on behalf of the replica(suspecting the replica is down) around the same time, shouldn't it take the later decision instead of the first decision? If some other node said a "No" (on this replica's behalf) and then the actual replica recovers itself and says a "yes" , then taking the later decision looks more logical . Same is true in the opposite case.
@m-ld3832 Год назад
At first glance, that approach is appealing, since it appears to be the safest, avoiding any confusion by taking the most conservative default position. However, that isn't actually necessary, by virtue of the way Total Order Broadcast works. This is down to the relative timing of the slow / recovered replica's vote of "Yes", and the consensus decision by all the nodes. If the "Yes" vote is received from the slow node _after_ all the other "No" votes from others on its behalf, those "No" votes are overridden by the "Yes", since that was the first vote seen by it from others.
What's not entirely clear from the video is precisely when a consensus is considered to have been reached, and if/how this is consequently communicated among them. Presumably, if all the other nodes have already settled on the decision against proceeding before the "Yes" vote is received from the slow one, then that decision is not invalidated. The previous video in this series may expand upon this.
@arthursimeon2620 2 года назад ⁺¹
So is the coordinator used for decision making on commits, and the total order broadcast system just a backup in case the coordinator crashes?
@austecon6818 Год назад
I still don't get how with geographically distributed nodes (with different ping/latency to each other)... total order broadcast can prevent a (very rare and unlikely) race condition where you have 5/10 nodes that get the failure detector message to abort fractions of a second before the sluggish node sends a vote to go ahead and commit... and the other 5/10 nodes would have the opposite ordering
If it happens at exactly the same time... due to network latency effects... you could have a split of the network (5 nodes with low ping to the failure detector and 5 nodes with low ping to the sluggish node but high ping to the failure detector)... so in that case do you just go with majority rules and always have an odd total number of nodes to decide which is the true(er) version of history? But now we are into 3 phases not 2 phases...
So is this like a shitty version of the raft protocol or something where it assumes 0 network latency?
@Ynno2 4 месяца назад
Total order broadcast requires consensus and if only 5/10 nodes have agreed then there's no quorum and no consensus. Neither event will be actionable until n/2+1 nodes have received it. If there is a 50%/50% split, neither side of the split will make any decisions (nothing will be committed and everything will grind to a halt) until the partition is resolved.
@QDem19 3 года назад
Thank you for going over this.
I have a question regarding slide 2 of the Fault tolerant 2PC. Which node is taking the decision on the fate of the transaction, is it the current term leader of the Total Order Broadcast, or can it be any node participating in the transaction.
It seems like it should be the former, i.e. current term leader, but just wanted to be sure.
@giorgiobuttiglieri5876 9 месяцев назад
Each node can independently understand if the distributed transaction failed: each node receives the same sequence of messages and the algorithm used to determine if the transaction failed is deterministic.
So all the nodes will reach the same conclusion without the need of a coordinator.
@jainamm5307 4 месяца назад
@@giorgiobuttiglieri5876 When you say each node receives the same sequence of messages - how is the "sequence" guranteed to be the same in every node?
@giorgiobuttiglieri5876 4 месяца назад
@@jainamm5307 For the proposed fault-tolerant version of the 2PC, we use total order broadcast as communication primitive.
So by definition all nodes receive the same messages in the same order.
If you are interested in how to achieve this, there are other videos in this channel explaining it very well
@murali1790able 2 года назад
I thought consensus are used in databases but looks like consensus can't solve atomic commit problem. Can anyone explain the real application of consensus?
@vhscampos1 2 года назад
Consensus achieves total order broadcast, i.e. all nodes deliver messages/operations in the same order.
@tarunstv796 2 года назад
Discourse from "distributed systems" God himself.
Very nice 👌
@HasanAmmori 2 года назад
"Reasonably simple way"... Yeah. That's what I thought
@BHARATKUMAR-le6eq 2 года назад
I have one more doubt. So we will wait to get an "OK" message from all the replicas or we will commit to a specific replica after receiving the "OK" message??. I mean if we will wait for all the replicas that make sense but if we just commit after receiving "OK" then it may consist of inconsistency. Ex if one replica sends the message "OK" and we commit the change to a specific replica but the other replica crash and does not send the "OK" message then both replica will be inconsistent.
@trozzonick77 3 года назад
Would not make more sense to use a Queue as or helper for the coordinator
@cristianokwiatkovsk9059 3 года назад
First cut the fucking hair xD next recording...

Следующие

Автовоспроизведение

Distributed Systems 7.2: Linearizability