Brilliant articulation and to the point. Also for the resources in between. Thanks to infoq for such amazing real world scalable stories from the industry
Great presentation. He missed answering one question about how likes at different times are stored and sent to the viewers. For real time watch, we don't need to store any time. But if we're supporting replaying the video and if we want to show the likes as it happened during live video broadcast, we'll have to store the time, say for example number of likes each second.
@@deformercr6680 you do not want to store each like as a record separatelly, it must be just an aggregation or counter in database. I can think on a map-reduce operation in the stream
Thanks for the video...With regards to Kafka, 1. When the client starts to watch the video server we can establish connection and store [key value ] the connection it. 2. We can have front end nodes subscribe to specific topics[videos] which it is gonna handle 3. Whenever a like happens we push into kafka and the front end nodes which are responsible for sending the like to all the subscribed connections. 4. they can get it from the stored connection key value store and send the like to the clients through akka actors . What is the issue in that ? Please clarify ?
+ 1 this is the part I'm confused. A given frontend server does NOT need to subscribe to topics of every live video, but only to the unique ones that its connections are watching
One of the best talks I have seen. Hats off to you. I have a doubt here. In more than 10k connections case, why do we need an additional layer of frontend nodes? Instead, why can't we store the mapping of video and Real-Time Dispatcher node in the distributed key-value store. The Likes backend can query the key-value store to get the dispatcher nodes for sending the like object. Once it reaches to dispatcher node, the node has the in-memory subscription table to send to the corresponding end user.
How is the key value store scaling? Is it replicated? Presumably the publication of "likes" does not have to be guaranteed, so inconsistency between replicas is acceptable (ie, some likes are not published to active subscribers)? (Edit: OK, second part of this question is covered in the Q&A, verifying the assumption, but is that also the answer to how the key value store scales?) And is that same premise applicable to other elements in the chain? Eg, if a front end times out in response to a publication request, or a client times out in response to a publication request, is it simply dropped or does it need to be retried? Presumably the former?
It's amazing how everything in IT boils down to layers of abstraction. Whether you're talking about local device I/O or about large distributed systems. It's abstractions all the way down. Once you realize that it gets so much easier to comprehend.
1. How does the KV store get updated when a front end dies? 2. How do the dispatchers find each other? What happens when one dies? 3. How does dispatcher broadcast prevent infinite loops? (1 publishes to 2 and 3, 2 publishes to 1 and 3, and so on) Why not use gossip protocol instead?
I'm still learning distributed systems but here's what I think. 1. We need a time-to-live on the KV entry (I'd think of the KV as more of a cache than persistent storage bc the data is changing some much that I would not want to write to disk). 2. In the single datacenter model, dispatchers don't talk to other dispatchers. In the multiple datacenter model, I would probably use kafka to fan out instead of letting dispatchers talk to other dispatchers. 3. If I understand your question correctly, I do not think that dispatching an event will trigger another dispatch. I believe we only dispatch when a client sends a message.
For 1. When a front end server dies, client looses connection. So KV store need not get updated immediately. Clients need to reconnect to a different frontend server and that time new subscription call will happen, which will update the KV store. Until then KV store may have a node which is not active and dispatcher may fail to call. But it can just ignore and move on with other active nodes.
He kind off explained that in the first question from audience. Reason is : Long polling using server side events are basically plain HTTP requests; so support even very older devices; do not get blocked by firewalls etc. Both of these problems exists with WebSockets.
It doesn't have to, node can subscribe to any number of dispatcher. Dispatcher in DC1 simply broadcast likes to all peer dispatcher in other DC and then it get handled as usual.
Why do not include "likes" and "comments" into the stream itself letting client not only show video, but also handle other type of messages in the stream? In this way it's possible to reuse existing video-stream infrastracture fully
Yeah we can broadcast all likes to all frontend nodes but that won't be efficient, as there might be some frontend nodes in which connections does have only red live videos watchers so sending green live video likes to it doesn't make any sense.
More like solve the problem "because its a nice engineering problem" than "because it will help the world". TBH I don't care how many people are reacting to a live video.
The problem will remain. The dispatcher will be bottleneck then and cannot be scaled as each dispatcher would have to consume all the events emanating from all kafka streams. Letting dispatcher decide which events to pull for breaks down the data chunk for it to process through thus providing the multiplicity we see at the end.
Perfect presentation !! The way in which he delivers the complex design choices in an easy and fun way is amazing!!
Enjoyed way he present.. (Proper slide, Great transitions, No breaks, engaged with audience, making jokes and all of that without losing the context.)
Brilliant articulation and to the point. Also for the resources in between. Thanks to infoq for such amazing real world scalable stories from the industry
Insightful information on live video interactions
Totally loved this evolutionary styled presentation, brilliant story telling!
Very engaging still calm and neat presentation. Insightful.
Amazing Explanation. This is clearly one of the best videos on qcon.
Thank you so much I watched the complete video and it is very helpful 🙌
Amazing presentation, learned a lot from it!
Wonderful talk. Very engaging and beautifully presented
The best presentation i’ve seen!
What is the raw computing power of one single Frontend machine? How many cores and ram?
Awesome many thanks 😍😍☺️👍😊🤗😱
Amazing presentation. Loved the way he has presented.
Great video Akhilesh! It was very clear, easy to follow and educational.
Great presentation. He missed answering one question about how likes at different times are stored and sent to the viewers. For real time watch, we don't need to store any time. But if we're supporting replaying the video and if we want to show the likes as it happened during live video broadcast, we'll have to store the time, say for example number of likes each second.
Pretty interesting talk! I would really like to know how they manage idempotence and duplication for likes and messages.
Just have an ID for each like I suppose
@@deformercr6680 you do not want to store each like as a record separatelly, it must be just an aggregation or counter in database. I can think on a map-reduce operation in the stream
@@javisartdesign yes you don't, but you assign it and store it in the client
@@deformercr6680 that means you can use different devices and sessions for multiple likes... that is cheating.
Thanks for the video...With regards to Kafka, 1. When the client starts to watch the video server we can establish connection and store [key value ] the connection it. 2. We can have front end nodes subscribe to specific topics[videos] which it is gonna handle 3. Whenever a like happens we push into kafka and the front end nodes which are responsible for sending the like to all the subscribed connections.
4. they can get it from the stored connection key value store and send the like to the clients through akka actors . What is the issue in that ? Please clarify ?
+ 1 this is the part I'm confused. A given frontend server does NOT need to subscribe to topics of every live video, but only to the unique ones that its connections are watching
I was thinking the same thing. Just two years later. ;)
Very well explained. Great Talk. Thanks
One of the best talks I have seen. Hats off to you. I have a doubt here. In more than 10k connections case, why do we need an additional layer of frontend nodes? Instead, why can't we store the mapping of video and Real-Time Dispatcher node in the distributed key-value store. The Likes backend can query the key-value store to get the dispatcher nodes for sending the like object. Once it reaches to dispatcher node, the node has the in-memory subscription table to send to the corresponding end user.
How is the key value store scaling? Is it replicated? Presumably the publication of "likes" does not have to be guaranteed, so inconsistency between replicas is acceptable (ie, some likes are not published to active subscribers)? (Edit: OK, second part of this question is covered in the Q&A, verifying the assumption, but is that also the answer to how the key value store scales?)
And is that same premise applicable to other elements in the chain? Eg, if a front end times out in response to a publication request, or a client times out in response to a publication request, is it simply dropped or does it need to be retried? Presumably the former?
It's amazing how everything in IT boils down to layers of abstraction. Whether you're talking about local device I/O or about large distributed systems. It's abstractions all the way down. Once you realize that it gets so much easier to comprehend.
Great video + presentation. Thanks
Very useful and clear presentation.
1. How does the KV store get updated when a front end dies?
2. How do the dispatchers find each other? What happens when one dies?
3. How does dispatcher broadcast prevent infinite loops? (1 publishes to 2 and 3, 2 publishes to 1 and 3, and so on) Why not use gossip protocol instead?
I'm still learning distributed systems but here's what I think.
1. We need a time-to-live on the KV entry (I'd think of the KV as more of a cache than persistent storage bc the data is changing some much that I would not want to write to disk).
2. In the single datacenter model, dispatchers don't talk to other dispatchers. In the multiple datacenter model, I would probably use kafka to fan out instead of letting dispatchers talk to other dispatchers.
3. If I understand your question correctly, I do not think that dispatching an event will trigger another dispatch. I believe we only dispatch when a client sends a message.
For 1. When a front end server dies, client looses connection. So KV store need not get updated immediately. Clients need to reconnect to a different frontend server and that time new subscription call will happen, which will update the KV store. Until then KV store may have a node which is not active and dispatcher may fail to call. But it can just ignore and move on with other active nodes.
Really nice.. but what about DB scaling or Key-value store mapping..
Really professional and easy to follow presentation. Loved it!
Whats the benefit of using long polling vs websockets?
curious too
You don't really need websocket here, cuz sending likes and comments don't need to be real time
@@gxbambu Yea, but they are calling it the "realtime" platform. Also, the platform is used for multiple use cases and not just for sending likes.
@@gxbambu the whole point of this presentation was distributing likes and comments in real-time , which is required for live streaming
He kind off explained that in the first question from audience. Reason is : Long polling using server side events are basically plain HTTP requests; so support even very older devices; do not get blocked by firewalls etc. Both of these problems exists with WebSockets.
I think we have 10x viewers on 11/11 when thousands of sellers selling their products online thru live video
Great talk! So informative. Thank you!
Really great presentation.
Does the dispatcher publishs data using WebSocket or SSE?
Awesome 👏
Thanks
Great talk dude
How does a client find a frontend server?
In the multi Datacenter scenario, how do you ensure that a node is subscribed to only one dispatcher ?
It doesn't have to, node can subscribe to any number of dispatcher. Dispatcher in DC1 simply broadcast likes to all peer dispatcher in other DC and then it get handled as usual.
Why do not include "likes" and "comments" into the stream itself letting client not only show video, but also handle other type of messages in the stream? In this way it's possible to reuse existing video-stream infrastracture fully
i do not understand if they use Kafka
macam mana nak buat?
why introduce the dispatcher? Why not just broadcast to all frontends?
at 47:53 he explains it is a matter of scaling.
Yeah we can broadcast all likes to all frontend nodes but that won't be efficient, as there might be some frontend nodes in which connections does have only red live videos watchers so sending green live video likes to it doesn't make any sense.
lmao, man do they ever feel goofy being like "we have dedicated infrastructure and engineering for a fucking thumbs up button"
"I get paid 600k USD a year to let people hit a like button" idk seems goofy to me
More like solve the problem "because its a nice engineering problem" than "because it will help the world". TBH I don't care how many people are reacting to a live video.
Why not add a Dispatcher between client and Frontend Server during Kafka ?
The problem will remain. The dispatcher will be bottleneck then and cannot be scaled as each dispatcher would have to consume all the events emanating from all kafka streams. Letting dispatcher decide which events to pull for breaks down the data chunk for it to process through thus providing the multiplicity we see at the end.
God and Jesus always