This is experience talking. Now I've been a coding for quite a while, this is one of the channels that promotes good development thinking rather than pointless ninja tachtics in code. Amazing content, thank you for the great channel! When I watch these vids I realize how much thinking the process takes to create good software 🙂
So true. You can tell inexperienced developers by how unnecessarily clever they try to be (ninja tactics, as you say). Code should always be plainly obvious about what it's trying to do, and do it in the simplest way possible the achieve the outcomes. It should also not make excessive use of shortcuts or abbreviations.
I’m a big fan of this style of architecture and have been using it for 30 years. I think a key point that needs emphasizing is that you must get the partitioning right. It is not always obvious how a problem should be partitioned. As software evolves unanticipated dependencies can emerge. This can lead to the desire to couple or more tightly control the sequence of various message processing entities if you are not strict about sticking to the architecture.
Yes, where you draw the boundaries matters a lot. I think that DDD is a good guide, with its bounded contexts, but you still need to think about that part carefully. But then that is true of any complex system.
I agree, there are some interesting ideas being talked about as “Stateful Serverless” systems that I think could be the basis for a more approachable dev experience. At the moment it is a bit of a minority sport, but I think it has MUCH broader applicability.
It's so crazy that this is what appeared in my feed today. I was in a meeting Friday where an application went down because what should have been an asynchronous event ended up needing a synchronous response to be listed as good. it is amazing to me how more systems aren't message-driven given how scalable they are.
I've been thinking a lot about state management for front end applications. The end result of my considerations was that Observable pattern really is the absolute ideal way to handle state management. You're describing reactive programming, aka functional reactive programming aka Observable Pattern which is already in use with RxJs, Vue.js and Angular. The only downside I can see with this pattern however is that our modern programming languages seem to be not well adapted for it yet. For example, the setup code for a message pipeline / system should be compile time generated and typed, but instead it is still being executed at runtime and untyped. The performance overhead here is usually insignificant, but the ergonomics (readability and also writing the code) could definitely still be improved.
Two questions about the approach you describe on 14:00, where the order state is only kept in memory and the service replays the messages on restart to recreate this state: 1) How do you keep startup times down if you have millions of old messages to run through? 2) How do you deal with breaking changes to the public contract or the service. Do you have to migrate all the old messages, do you write som special legacy handlers in the code, or something different?
In event-sourced systems, question 1 is usually addressed with snapshots of the system taken at a certain interval, which themselves can be stored as a special type of event. The complete history of events is persisted as a backup. Problem 2 is more complex and a notorious downside of that type of system. Greg Young has a great talk about the topic if you're interested.
Please consider adding more info or thoughts about these. Consider advantages, disadvantages, use cases where they would be better than other paradigms etc.
This can be useful within a single process too. I recently worked with a product that was split up in several components (domains) that each had their own main thread. They subscribed to each other's events and called into each other at will. With a subscription you could choose what thread you wanted to be called on. It was a total mess: side effects could call back into the calling component without you being aware of it, causing hard to track down deadlocks. Unfortunately no one recognized this as a problem worth addressing. I no longer work there and with some distance to the matter am now creating a framework/library that fixes the issues I had to deal with when I was working there. The concepts/principles presented in the video are identical to what I've come up with so far.
My foray into Angular resulted in apathy towards the framework but a love for RXJS and reactive principles. While organizing a project with reactive principles in mind is my favorite, debugging can be quite the challenge with standard techniques. It would be great if libraries had a debug mode that tracked messages in the system.
I've very happily used this architecture approach a few times but it should be pointed that there are dragons to be aware of: - deep message/event chains will produce a system that is hard to understand/reason about the flow of logic - event chains where a failure (expected or not) requires actions based on previous events in the chain to be undone are hard to achieve and can get messy ** - teams tend to treat messages as a contract which leads to coupling. It can also lead to domain leaking as teams tend to send events that are actually commands for a specific service to react to, rather than emit events that say "this happened" ** Saga pattern or orchestrated events over choreographed can help with this... Just like any architecture approach this is not a one size fits all
Excellent Presentation! I have been building reactive systems in Akka/Scala for years, and you have really captured much of the essence of what I have experienced better than I could imagine. You have also alluded to several other important aspects, and I hope you will expand on some of this in future presentations. Because of this presentation, I have now subscribed to the channel.
Thanks for the video! Do you think you could make a video about the trade-offs of this approach? It looks to me like the robustness of this design doesn't come for free? Or does it?
Excellent presentation. I’ve been designing software like this for at least 20 years. Thanks for giving me a label for the approach, that I can put in a CV.
Brilliant ! So without naming them, I believe the concepts touched are reactive microservices, Event Sourcing ( and CQRS ?) ? Why do the services themselves need to be single threaded? A reactive multithreaded asynchronous & non-blocking model (Rx, Reactor, Akka) would work, in fact would be great ? Is is correct to say that Reactive Programming and Reactive Systems are analogous with scale being the factor ? Both are based on async non-blocking message passing ?
Yeah in other video's he is more clear about being opiniated. And I have seen this combination being used in multiple systems: really powerfully but a framework on its own (hard to understand/learn). That single threaded thing reminds me of the 'Actor' model. It is for consistency purpose so you don't need locking in theory, but in practice the partitioning will make you still need some consistency control but then eventual consistency will be enough. And the actor is managed, like he said, that will allow multiple messages to be processed at the same time. Also it is a combination of data with logic so you are fast, another thing he hinted at.
A great explanation! Would it be possible to delve into things like: 1. What happens when we need different versions of the same service to exist, and how we go about building them? For eg. in the context of a pub-sub (reactive) v/s a REST system. 2. Can we fit all use cases in the reactive systems, or are certain cases better handled by non reactive designs?
Explanation is super clear, though one topic is still a bit vague to me. Who is responsible for message streams? Should we pick classic OO style where the one who emits message tells to whom it goes? Or we pick Rx way where emitter sends message in some medium from where it can be retrieved by anyone who wants? Alternatively we can create some intermediate "telephone switchboard" entity which tells who gets which messages. I've tried a bit first two methods and i have not found any serious flaws in any of them(so i am very suspicious). But i feel like this choice has great influence on general design and dependencies in particular. I would've applied rule "high level modules should not depend on low level modules" but with messages it's not clear how to define dependency
Thanks for this video. Can we have a detailed video wherein a reactive application is built using available frameworks/libs/toolkits like akka, rx etc?
I am not sure that this kind of more detailed tutorial fits my channel very well. It is probably only of interest to a smaller group of people and most of my videos are under 20 minutes, and you can't get very far with anything other than simples examples in 20 minutes. It would be fun though.
@@ContinuousDelivery one possibility you could consider is providing a sample by github, and in a video, cover the sample on a high level, covering the system structure and using the sample as a launching point for a more conceptual discussion.
So reactive systems are basically event driven design? What kind of messages are you thinking of? If the responding messages are async they probably need to make a connection, like sockets or xmpp or what have you. Do I understand that correctly? I’m intrigued and would love to know more about how the messaging parts actually work.
You need to define your messages carefully for this to work. It's should be signals about state transitions, not a command system with a request/response pattern.
Near the end, you gave the example of having multiple stores if there were too many messages coming in. I'm confused about how that clustering would work. If you send all of the messages to both stores then the clustering doesn't gain you anything because both stores will be overwhelmed. However, if you send some of the orders to the first store and the rest of the orders to the second store, how do you subsequently know which store to send a message to to retrieve details about a specific order?
In the last example it is not "Clustering" it is "Sharding" there is some external process that allocates orders to an instance of a store to process it. There are a variety of strategies for allocating to a shard, but it is complex and does add overhead. You are really getting back to the SEDA example earlier in the video, except you are doing it at the level of messages rather than threads, which is a bit more cost-effective.
I went down the Erlang/Elixir rabbit hole after your OO vs. Functional video, and I think it's pretty cool stuff. Glad to see more from you about these ideas.
Thank you. Really like the way explain it. I understand this is an introduction, it would be useful to have some implementation examples. This looks like message queue architecture what's the difference? How is scalability and reactivity implemented?
There is some level of 'supervision' that I didn't talk about in this video. Each services generates 'back-pressure' when it is under stress, up-stream parts of the system notice the back pressure and can then react to it and add more, or reduce if the load is low, resources.
This is such a goldmine, all of your videos. Thanks for producing this content and making it available. Ps, an message about an order for a particular book is now circling around there somewhere ;)
Hi Dave, that was great! In trying to understand general concepts of software, are reactive systems the foundation for "low code" or no code platforms being created today?
No, mostly "Low-Code" is a reboot of an idea that crops up every few years. "Visual Programming" there have been lots of attempts to raise the level of abstraction through visual programming. These approaches are usually very good at simple problems for a narrow set of problems, but quickly become problematic as the problem gets more difficult. I confess that I haven't paid much attention to the current crop of "Low-code" systems, they don't really feature in the kind of systems that I am involved with, and given previous false-starts, I am something of a skeptic.
On a system that gets high levels of traffic does the startup time of a service not eventually become prohibitively time consuming with the replay method for getting from blank slate to current state at startup?
Check-out my friend Martin talking about Aeron ruclips.net/video/tM4YskS94b0/видео.html or me and Martin talking about some aspects of our version of this a few years ago: www.infoq.com/presentations/LMAX-Disruptor-100K-TPS-at-Less-than-1ms-Latency/
I have a few questions: just how complex are the actors allowed to be? How does the messaging system work at scale across the network and how does it know where to go to resolve addresses of actors? I've worked with event based systems in the past and I've struggled to debug them as the bugs tended to be on the big picture integration level, how do you combat and debug this?
this is like OO, where each microservice or component is like an object, and they send messages. They also have a buffer of those messages, and message sends are transactions, much like database transactions, making sure the other side received it, or undoing the message send in order to put it back up in the queue of messages to send once the receiver can properly receive it. Terms change when talking about different levels of computers and software, but the concepts remain the same.
I'm just learning about this now. It's sublime how decades old simple ideas resurface in an (r)evolutionary way, making software development faster and easier. From Smalltalk to distributed reactive systems, we've come a very long way. 😊
Yes, the self-similarity of SW at different resolutions of detail is interesting. When we were doing the initial thinking about building the LMAX Disruptor, some tech that was used to implement the infrastructure for our reactive system (we open sourced it, so you can see it here: lmax-exchange.github.io/disruptor/) we were at the stage of discussing ideas around a white board. This was at the level of lock-free programming and optimising on-processor cache usage, when one of our colleagues, a very good DBA, cam into the room. We said, what do you think about this Andy, and explained what we were thinking of, and he said "Oh yeah, we do the same thing to manage caching in relational DBs" 🤣🤣
@@ContinuousDelivery Cool! 😊 I think I know how we can fix the knowledge sharing problem here and how we might stop reinventing the wheel. We need to look for a "software engineering theory of everything!" Just like physicist!
@@ContinuousDelivery haha, interesting! I haven't finished it yet. But I'll be back to reading it soon. 😁 just went quite busy at work soon after I got it.
Hello, I have never worked on a reactive system, but I am wondering what to do in a case when you want to change a message (sent from service A to service B) to a new version of the message? Do you migrate old messages to the new format in that case (probably not), or do you keep supporting the first version of the message forever for the sake of replayabiliy?
It is one of the more difficult parts. There are largely 2 choices, you support old versions of the messages, or you snapshot the state of the service, and start recording new messages from the new baseline of the snap-shotted state.
This kind of system is incredibly complex to debug especially if it is multi threaded. If for whatever reason a message is not dispatched and there is lots of messaging between many different message boundaries this architecture can be a nightmare to troubleshoot. Process correlation in the logging would help of course. Another problem with this type of design is finding developers with enough proficiency to work with it. That being said, I love how this architecture scales and encapsulates boundaries of knowledge.
What about event ordering - if I try to compare it a bit to TCP packets and UDP packets, it makes me wonder how to avoid, detect and/or correct - events which fell out of order?!
When you store (and later replay) the messages for a service to get it in an up-to-date state, do you record and replay the messages in the order by which they were received? Since you potentially might be dealing with a distributed system, messages might be reordered if they originate from multiple sources. Replaying the messages in the wrong order might yield an incorrect state.
Yes, not necessarily globally, there are some techniques to limit what you store, and keep determinism. "Command Sourcing" instead of "Event Sourcing" for example. I had a goo chat with Frank Yu, who is building these sorts of systems recently, you can see it here: ruclips.net/video/KG6bPVWBl5g/видео.html
Wait. This is just microservices but within the code itself. It also makes a lot of sense that knowing where to draw the boundaries matters a lot, because it is that way with microservices as well.
This sounds similar to the Tandem Non-Stop architecture? I am new to learning programming, and I have found your videos to be extremely useful in my progression. Thank you for making high quality content like this.
In this example, how do we add more warehouses? More specifically how can we route the messages to different warehouses such that only one of them respond? What if for some reason we need to create unique id messages in response to orders, how can we maintain consensus?
PlaceOrder is a, usually synchronous, Command. OrderPlaced is an asynchronous Domain Event. I view Domain Events as immutable facts that occurred in the past. They are often data. So for OrderPlaced, it would include data, such as the order number, the item ordered, the price, the buyer's information, etc. Two nice follow up videos to this video could be Event Storming and Event Sourcing.
One of the things that these kinds of systems have made me think, is ghat there is no real synchrony. It us a fake that is layered on the more fundamental async transmission of information. I think this idea actually goes quite deep. Why does “PlaceOrder” have to be sync, when it can be faster, more resilient, more scalable and more robust if it isn’t. Actually at the level of Physics sync is a very questionable idea - funky stuff, I know 😁😎
I hope this channel has videos that shows more code than talking. First half of the video talks about the concept and the second half shows code. Regardless this is a good video.
I actually like that this channel focuses on the conceptual side. The conceptual is the foundation of implementation; however, not many people seem to give the conceptual side adequate attention or respect. It is easy to find code examples and tutorials on RUclips, but not as easy to find deeper conceptual videos that are genuinely helpful. I think perhaps a video in which a link to a sample is provided along with a high level overview might fit into the domain of this channel better than a coding video.
How does the replaying of message to get the "current" state work when the project is running for a long time? I can image there ware already multiple versions of both the messages and the service? Isn't such a replay a slow operation? You need to read out all the messages (or best case only the messages handled by the service) from somewhere. Also could be that a service started handling an extra message at some point in time, would such replay yield a different result than?
Most of these systems usually take periodic 'snapshots' to reset a new start-point and then replay from there. Most of my experience of systems like this is from financial trading systems, so VERY VERY FAST, so replaying was mostly a non-event. I seem to remember that in the excahnge that we built at LMAX on startup the 'big services' would take a couple of minutes to replay a week's messages. This despite the fact that we occasionally hit message rates, for very short periods, of the equivalent of around 1 million per second. You are right the tricky bit is migrating to new versions of messages, but that is only the same problem as any other data migration problem really.
@@ContinuousDelivery Thank you for the answer, especially for the real world use case. Indeed from the very nature of the reactive services the replay is fast. Snapshot indeed can speed it up, only it adds extra complexity, because you need to snapshot each service entity separately (it has own memory) and you need to restore restore same amount of services back up. I find the data migration of messages harder than normal migration of databases, because there is extra code executing dimension to it (when replay) where in database migration there is the data changing dimension. One issue which I was still wondering about. When the service goes down unexpectedly, how to we know how far the replay should be run and from where the service should start processing the messages normally and sent it's own to network? Are there some good patterns for it?
@@szymonlukaszczyk I agree with you that it seems trickier, in reality though there is really no difference, I think that the reason that it seems harder is just that we are more familiar with DBs. If I have an Address in a DB and I want to add a Postcode field, I have the same set of problems as if I have an Address in a message that I want to enhance. I need to populate old Address that pre-dated this change with some default value for the Address. So all we are talking about is the mechanics of how we do it really, the problem itself is the same. The restart process itself is pretty clear, you need to load the snapshot, then replay all post-snapshot messages to re-establish the state. The only mlidly tricky part is that you want to allow new messages that arrive during the restart to be added to the message log while you are replaying so that you don't miss anything. This is all problems for the infrastructure though. At LMAX we built this stuff on top of some software that we released as OSS "The Disruptor" lmax-exchange.github.io/disruptor/
I totally agree that these systems is much more easy to develop, maintain and to extend (yet bit technically complicated). I remember The Fear of pealing off layers of onion just to get to its core to implement some simple new functionality in some very old monolitic system. The very next day you started your monolith... the business requirements will change!
I seriously cannot imagine why would anyone deliberately click on the dislike button. Perhaps 4 people were in a rush and their hands slipped and missed the like button?
Yes, more please! Would you say this is the way that microservices would work? Each MS is an actor with its set of messages in progress? Good separation of concerns appears to be critical.
Another fantastic presentation, I'm working on a chatbot, where everything seems like it needs to be synchronous, so this helps with unpicking those assumptions. These videos are also great for starting conversations with team members, giving us all a shared understanding. Often, I think I know enough about a topic that you cover, but even if I do (and I don't always) I wouldn't be able to articulate it so well.
I would like to hear more details about "entrails" of such systems. For, example if its better to use Queues like Kafka or Rabbit, or there is exist more interesting ways
The capability of software or more generally, computers to respond gracefully has been widely available for decades. For example, someone writes a function to calculate the sine of an input argument. After writing the function to handle all numbers, real and complex. Next, the developer considers how the function should respond to non-numbers such as NaN, Inf and text. There are many options on how to handle these non-numbers. My experience is that sound software development includes consideration of these issues and it has since at least 60 years.
Regarding forgoing a normal state-based database (with different backup strategies) in favour of a message log that records all messages in perpetuity, have there been any studies in spacial and computational resources required by these two strategies? From an enviromental standpoint I could potentially see an issue in having a system that cannot take a stateful db as starting point after a reboot, but instead have to rely on replaying every single event that's ever been recorded - a list that will continue to grow and only ever grow, requiring ever more storage and computation to replay.
Actually, for most modern relational databases, this is how they work internally anyway. So there is no difference in these models at that level, This approach inverts some of the relationships between the steps. In these sorts of event-based models the ideas to replay and how you manage them over time are well understood. The commonest pattern is regular 'snapshots' where, effectively you re-base the point from which you need to replay things. We used to rebase our event streams on a weekly basis. On the whole, my experience with this is that this approach requires quite dramatically fewer resources than others. Our financial exchange that was built this way processed 1.3 x the daily volume of Twitter on basically two computers. Storage did grow fast though, but it was stored in a more compact form that it would have been if it had been broken out into DB records. So overall this is a more reliable, more efficient approach and does ask us to generate more CO2 to achieve it.
@@ContinuousDelivery How do you "rebase" the event stream? Would that not require the system to be able to start from a stateful snapshot, rather than from the beginning of the complete event timeline, which is how I understood this point in your video.
Not sure was it discussed there in this channel... but it would be interesting to hear about such thing like... "Errors is features!". If one of the downstream services fails... it's not an error. It's a unimplemented feature.
I don't have a lot of experience with this but this is my interpretation. "Reactiveness" is not boolean, you can have a system more or less reactive. In practice, it seems to mean using techniques that allow systems to scale themselves to demand. Message queues paired with automatic horizontal scaling make the system more reactive. The technique that is most usually associated with reactiveness is reactive streams. These are described in the video.
Hi nice video. One question to the corner scenario at 9:03. How the store know that the book is not available and reject order? How to do that without any wait for response or periodic msg from warehouse?
Depends very much on the bussiness logic. 1.Do you want to automatically reject orders if the warehouse doesn't have it? Send a message from warehouse that the store listen to: item(id#223) out of stock and the store react. 2. Do you want to wait for an amount of time before you cancel it? Do you expect that the werehouse will dispatch the book in 10 days, listen for a specific message if not. Just change it to rejected.
How do you replay events through an evolving system? For example, some subset of messages are processed by api v1, others by v2, etc. How do you handle the bookkeeping to 'recreate' reality? Is there a system that tracks this kind of meta data?
There are a few different strategies. In reality, for this kind of system, you don’t keep all of the events forever, instead you “snapshot” some state, and then replay events after the last snapshot. So, for an orderly shut-down, you can snapshot, and then migrate the snapshot as part of the deployment of the new version, that is what we did for our exchange.
@@thewiirocks Yes, I never used it but I think it was based on very simlar ideas. We don't claim that any of the work in the Reactive Manifesto, or the LMAX architecture, that my version is based on, was original, but we added some nice features that I think took it further than Prevayler (but I could be wrong I don't know it well).
@@ContinuousDelivery I’m mostly teasing. Prevayler was the last gasp for Object Systems as databases from the early 2000s. We really loved OOP back then and kept trying to eliminate all the yucky SQL. I weep when I realize my colleagues and I unwittingly unleashed the evils of Object Relational Mapping (ORM) in our mad quest to eliminate relational databases in favor of Object Systems. Prevayler solved all the problems and brought such a system come to life. And it was awful. Rather than manipulating the objects directly, we had to create serializable “command” objects that were responsible for updating the object graph. These commands are what get stored in the log structure and replayed on startup. Like any good log-structured database, it snapshots the object graph and flushes the log at regular intervals. It all works great. But after writing the fifth or sixth transactional command object, it gets pretty clear that the time investment isn’t worth it. It’s faster to use regular database commands where the data elements are easily identifiable and comparable.
Cute. I mean, there's nothing wrong with anything said here. It's just that it completely ignores all the complexity and problems introduced by this type of architecture. Telling developers that this is "actually simpler" is quite misleading - a lot of devs might venture into this after watching, and all the ugly questions will start to arise. This type of architecture may be simpler for some systems that require extreme scalability, and no doubt Netflix, Amazon and Google need this, and it works for them. But in reality, you trade off one type of problems for another - infrastructure, deployments and versioning, refactoring etc. are all vastly more complex at scale... A video like this needs to prepare people honestly for the new complexity they're facing. It's not for everyone. And it's certainly not appropriate for every project.
Well, there is only so much that you can say in a 17 minute video. So yes it doesn't talk about everything. I am interested in where you think the extra complexity lies? It is certainly a different way of thinking about systems, but I genuinely don't think that it adds any complexity that isn't already there, it may surface it and make it more obvious to people, but I don't think it adds any. In addition to that, it does allow you to ignore stuff that otherwise infects nearly every aspect of normal system design, for systems at any scale. As I said in the video, I recognise that it is a subjective, contentious, statement to say that this is easier it is a balance of trade-offs, but I think that overall for systems of any reasonable complexity, this is easier.
@@ContinuousDelivery I don't think the choice has much to do with the complexity of the system you're building. It's more a matter of scale. If you need to scale to extreme workloads, or extremely big teams, it is probably worthwhile. But a system can be very large or complex without facing any of the problems of scale that this type of architecture was designed to address. Frankly, I don't even know how you can ask where the extra complexity lies? When you choose patterns that require infrastructure, of course you add operational overhead. Service topologies, on their own, is a huge subject. When you add network dependency to all your systems, of course that adds concerns about reliability, error handling, timing, delays, resiliency, etc. etc... You draw these neat little diagrams, but unlike a simple function-call, every one of those arrows implies all manner of complexity and concerns about all of these things. Choosing this type of architecture should be a really well informed decision about a deeper investment in high availability and system/team scalability, because it comes at a premium compared to a simple monolith. Like, for one, you need manpower, and not many people have enough (successful) experience, or the expertise to build something like this that's also maintainable and stable. Far too many teams have ventured into microservice architecture cluelessly already, one disaster after another - there are countless articles about the failures in this area. Look. I'm not knocking it. It has it's purpose. But to sell this like it's going to be easy is extremely misleading. If you anticipate problems of scale, then yes, this will be relatively easier than trying to optimize a traditional monolith. But if you don't have those problems, it's just layers of meaningless complexity. Message based architecture of any sort is not a cure-all for anything. "Right tool for the job."
@@ContinuousDelivery ... with all that said, I'm following with some excitement projects like Ballarina and Motoko and a few others who are attempting to abstract away most of that complexity at the language level. If they succeed, the complexity argument might start to fade. Until then, these systems have to be built in languages that really (really) weren't built for anything this network dependent - which currently have to solve problems with libraries and frameworks that substantially complicate, well, everything, while adding the burden of infrastructure management. This could change! I would love to see that happen. But we're some years away from this type of architecture being something you reasonably can or should choose for projects that don't anticipate big problems of scale. In my opinion.
@@RasmusSchultz Fair enough, I began the video by talking about larger scale systems. Sure, if you are writing a CRUD web-app for you Mum's cake shop this is probably the wrong choice. However, as soon as you add anything beyond the most trivial need for distribution, I think that this approach shines. My background is in building distributed systems, for many years, and so that is what I think of. I would argue that any form of distribution adds significant complexity. This approach manages that, and the very complex failure modes inherent in distributed systems, better than anything else that I know. I don't think that this is only for "highly available" systems either. I think that one of the major benefits, is that you can create a local, simple, application from a small collection of modules/services/actors (whatever you call them) and it will be easy to program and easy to test. This same app can then be distributed without needing to change the code of the services themselves. Sure, you will need to do more work, thinking about sharding and clustering and so on, but the code itself doesn't need to change. The exchange that my team built this way ran on a network of over 100 nodes in two data-centers, but the whole system could also run on my laptop.
@@RasmusSchultz AKKA is pretty mature and used in lots of real world projects. Aeron is built on ideas and technical approaches that we began at LMAX, Hydra that I talked about in a video about BDD (ruclips.net/video/9P5WG8CkPrQ/видео.html), is used for building trading systems very quickly and very simply. I think that you are right that this isn't yet at the level of "Out of the box" development (though AKKA does that) but the reason I did this video was to 1) celebrate 30k signatiries for the Reactive Manifesto, and 2) because I think that this approach, more than any other that I know, has a chance to make dev of complex systems easier. It can allow less skilled/experienced programmers to focus on the problem in front of them without getting lost in the computer science and technical nerdery of complex distributed systems. This is a big deal I think, the industry needs something to make building more complex systems a little easier. Microservices is designed to solve a different problem, but many inexperienced teams and developers mistake Microservices for (Services) and so think that Micorservices is all that you need for distributed systems, and it is nowhere near enough.
In the design of the architecture, was at some point pull-based messaging considered? This would have the advantage of not needing pushback when downstream modules become overwhelmed with messages.
I'm a big fan of the concept of reactive programming. However, once you are looking for nails for this new hammer, you are usually directed straight to already-built systems like Kafka et al, but there seems to be a lack of "passing on the skill of building these from scratch" out there. I'd love to see a simple example of the Me, the Store, and the Warehouse in JavaScript (or other) using just vanilla features. I'm particularly interesting in seeing how the Store would rebuild it's state by a full replay, and what happens when that replay is millions of transactions long. Thanks in advance for any advice. This would be a pretty cool example to "pair" on if you were so inclined ;)
Some people say the system can have the latest state cached somewhere to serve requests after the list of events gets too big. Then the events are used in case you need a replay; otherwise, the cached state is used.
When you speak of reactive systems, do the services have to use single threaded architecture in order to be "truly reactive", even if they can scale horizontally on a platform like Kubernetes?
The architecture at large is very much not single threaded, but instead a large collection of mannnnnnny different single threads, each responsible for their own unique message queue. A large importance of this design is to ensure that the messages can be processed in the correct order (generally FIFO). The thread processing a single message queue is free to spin off / delegate any work to one or more async threads (or even other "actors" handling their own separate message queue) so long as any response to that async delegated work comes back into the same single threaded message queue. This is key for any bit of your system that needs to maintain some changing state that may impact how you respond to / process each new message that comes through. If your service isn't tracking any internal state then there's probably not any need for a single message queue, your risk of race conditions is almost nil without shared state. What this does is help limit things like race conditions and other problems that occur when dealing with shared state across multiple threads.
A reactive systems is a big state machine but in architectural level. A problem I have with this type of system is the traceability. It is really hard to reason about it or design it. Although, you can choose an orchestration approach over a choreography one. Nevertheless, they are hard to build and design. You need a very good team that knows what they are doing to choose this approach. The same can be said about microservice but people choose it anyway hahahaha.
You think that's good, you should try a changelog-based data processing system. i.e. Each node tracks the records that change and downstream nodes update themselves based on the upstream changes. That means that data can be safely removed from a system just by removing a file at the "edge" and letting the giant state machine catch up. All the combinations and aggregations will get redone, leading to a consistent state of safe and provable deletion. Of course, my system kept a full list of change keys from all the upstream records, so I could truly prove a record's deletion. And yeah, this isn't easy for most engineers to wrap their heads around. But I've noticed over the years that they manage to wrap their heads around whatever is popular at the moment. Even if it makes NO BLOODY SENSE. (I'm looking at you, Backbone.js. And all your descendants!)
@@thewiirocks hahahahaha. True! People get used to basically anything. But most of the time, IMHO, I think people can work on something complex even though they do not understand it properly. It takes time and experience to get the full picture of the thing. Wow changelog-based... I've never heard that one. I'll take some time to process what you typed. Do you have any article about it? It seems cool to know about it.
@@thewiirocks hahaha, same here. When people say to me they are working with Microservice I always ask what is their definition of Microservice. Most of the time is not even distributed... OMG!
Let me get this straight: you want to create a book store on top of a differential log for state? (no ACID database). So when the system restarts the book store recreates the order state by processing the billions of lines in the log file storing the current state in memory? (e.g. Amazon). I find this incomprehensible.
That was my thought as well. It seems like that would be much slower than a snapshot style recovery mechanism. Maybe the two could be used hand in hand? Like a snapshot every few hours, and use the logs to fill in the state change that occurred in between the snapshots?
This is essentially Event Sourcing or CQRS. You would store snapshots and play forward from the most recent valid snapshot if your application required this.
How do you keep an overview of messages? As in who produces message X and who consumes X. It is easier when it is a mono code repository but I imagine with a high number of smaller code repositories it will be difficult to keep/get an overview?
Yes, keep it in one repo is the simplest solution. When we built our exchange this way, we kept the whole enterprise system in one repo, and had a single Deployment Pipeline that evaluated everything.
Event-driven doesn’t necessarily mean asynch, it also doesn’t focus on the other ideas of responsiveness, resilience and elasticity. There is more to the Reactive Manifesto than only asynch, but it forms the foundations for the other things.
This is certainly an event-driven approach, but that is rather like saying that it is a code-based approach too. There is more to this than "event-driven" implies. None of it is new, but it is assembled in interesting ways, and has some very nice characteristics as a result.
Thank you for the reply. I suppose my confusion was because I always thought of event driven as containing the things you mentioned as reactive systems.
I do wonder how such a system would handle requests like "give me the average temperature of the last 30 days". In order to have that kind of data, action needs to be taken *before* request comes in. How would this be modeled in Reactive Systems?
Given this video and the BDD videos, I think it would be very interesting to see a practical example of how to use BDD/TDD to test a reactive system. I am curious where one would test with which kind of test (and why). These techniques look like a good way towards scalable high performance systems, but a practical example would help me a lot in getting started.
For me it looks like reinventing the wheel: Unix uses pipelines since ever (i.e. roughly around 50 years!), and that's exactly what this "modern" kind of software does: waiting for messages coming from stdin, processing the data a bit back and forth, and sending the output to stdout where another piece of code is reading from stdin ... Oh yes, and one more thing: does anybody recall the term "SSR" (sub-second response)? That means that a software must show a response in less than a second to be viewed as "fluent". IMO this can only be achieved by using black boxes which are specialised on performing just a single (or at max a few) task(s) which are listening to messages and answering by messages. But one thing is very important: the messaging system itself is the bottleneck. I can remember a situation where the Berlin emergency management system was overloaded with messages at new years eve, because every incoming phone call automatically created a message which couldn't become handled by the back office. So people didn't receive an answer in due time, and hung up just to redial the emergency number again - which then created a new message while the first one still remained in the system waiting to become processed ... Of course this shows bad system design, but shit will happen ;-)
Yes, but not really. It is certainly an analogy for similar approaches, not to say this stuff is new, the Transputer from the 1980's a massively parallel CPU where the nodes communicated by internal asynchronous 'messages' on a chip is a more direct antecedent. Your Unix analogy breaks down at a number of places in comparison, the main one being that the messages aren't async, and you example at the end is cured by adding hysteresis to the propagation of errors, and back-pressure to messages to limit inputs when under-stress, function of the messaging in Reactive Systems. As an analogy it is reasonable though, and in part that is because we are talking about some fairly fundamental ideas, like comms and the separation of the messaging (pipes in Unix) from discrete processing units. Oh, and one more thing, in Reactive Systems we are usually talking about message latencies of micro-seconds. The fastest messaging system in the world Aeron was built for these kinds of system.
Somehow I've found my way here. It's a long story lol. I find the concepts of this video interesting but I wonder, doesn't storing messages like this lead to holding on to more data than necessary? Or are there patterns for clearing fully processed messages from your buffer that no longer have any impact on the state of your store?
Yes, there are other patterns to help. Most systems that work like this have some form of "snapshot" mechanism to update the base-state. You store and process a sequence of events. At some time you establish a new "snapshot" that represents the complete current state of the system, then you can start from there and only need to replay new events that happened after the snapshot was taken.
Great video as usual, thanks! Have you worked with either Elixir or Erlang, the system that you describe here sounds really close to the OTP framework introduced in Erlang!
You keep delivering continuously very valuable content! Thanks for that! I'd love to see you some day upgrading the camera and the green screen lighting, and by doing that gaining more lively image quality. Or maybe even a real background and a TV could work very well for you as well. But by no means I'm trying to imply that with the current setup it would have significantly negative effect on the content. These are just details I - for some reason - tend to usually catch about productions. Anyhow - Great work and I'll sure be watching your upcoming videos!
“Replay all the messages in order to restore the state” sounds simple. But in this, possibly irrelevant example, what does it means? Re do all the book orders messages that ever happened? Who stores all the history? Why not backup/restore the state it self (as done in today systems)?
Yes, you replay all the orders, but during replay, one option is to disconnect the bits of the system that act on the orders, so you get a perfect reproduction of the state of the system, but don't re-do the shipping of orders, for example. The advantage of this strategy over the more conventional "state snapshot" approach of storing a static picture of the state at some moment in time, is that in reactive, event-based, systems like this you don't loose information. Ion a DB, you loose the time dimension, unless you actively design the DB to store it, and relational DBs in particular are pretty poor at keeping time-series data. In an event based system every event that ever changes the system is kept, in-order, so that you can rewind, or fast-forward through time. For example. You can replay all of the events from production until you hit a bug that caused a problem. One final thought, if you are distrustful of this strategy, this is how relational DBs work internally, it is just that they don't keep the events once they have acted on them, and so loose the time dimension that describe HOW the data got to the state that it is in.
@@ContinuousDelivery it sounds intresting but it kinds creates more questions than it provides answers. The messages must be all stored implies ever growing data to be stored (much bigger than a snapshot). The rewind process must be computationaly expensive, might it happen that a rewind of all the years long message history could lead to unacceptable long down times or degradations? And what is halfway through history you change the version of your system making it incompatible with previous messages? I guess i must fork a nice example and do several pocs before i can partially comprehend the implication of using this type of architecture.
@@tigoes There are solutions to all of these. It is common to not replay all messages from the dawn of history, and to periodically "snapshot" so limiting how far you can rewind for example, but the storage can be *instead* not *as well* as conventional DBs. As I said, this is how RDBMS works, Reactive systems just apply this at the app level, rather than at the infrastructure level, and gain lots of interesting benefits from this different approach. It is not applicable to every type of system, but then nothing is.
If you zoom into the details there are differences. Mainly the leeway in consistency levels is much better in a system like this, as is the partitioning (sharing) and replication. This helps with resilience, scalability and performance.
This sounds awfully a lot like Tony Hoare's Communicating Sequential Processes: en.m.wikipedia.org/wiki/Communicating_sequential_processes Which is from the 20th century BTW ;-) It is also the main design inspiration for Go's goroutines and channels. And similar message passing primitives are used in other languages.
I guess the main difference is that, having or not primitives in the language to use the CSP or Reactive approach doesn't mean it will be used properly and design with the right mindset. Also the primitives in Go and mean the message posting is synchronous, as it is waiting for a message. And what is worse, that there must be something listening on every channel or sender's might block. I guess having a bus type channel in which messages live and get replicated for several consumers makes the difference.
@@joseluisvazquez5126 Yes, it is certainly related to Tony Hoare's stuff, also Erlang. The asynchrony matters a lot IMO, because that is one of the things that decouples, and so simplifies, things in distributed systems.
@@ContinuousDelivery Interesting! Yes, you need decoupling there. I wonder what does that mean for the send and receive interfaces exactly. For instance, does the send return anything? even if you do not expect a reply it might make sense to return an error when the message could not even be posted or emitted. What about the receive? Maybe different message systems make different choices there.
@@joseluisvazquez5126 My preference is true async. You do work on reciept of a message, you send messages out with your results. These aren't so much responses, more events in the life of the system. You end up with services as little state machines, I receive an "orderBook" message, record the order and send out a "bookOrdered" message. This is a form of response, but it isn't targeted and any service interested in the book having been ordered can listen to it. There is no specific target destination. The original sender of the "orderBook" message is probably listening for "bookOrdered" and correlating the response, but my service doesn't care about that and doesn't know. There is no difference is routing between "orderBook" and "bookOrdered" they are both just messages and treated everywhere in the same way. It's a very nice, very simple model.
@@ContinuousDelivery Yes, I get it is a message bus with many to many communications, no addressing needed. Messages get enqueued to all subscribers and they just ignore messages that do know recognize. When you say "true async" I guess it means "bus.send(msg)" returns nothing, not even an error. If an error does happen, even just at en-queuing for sending, the message system will log it itself or try to recover from it on its own, the sender is not bothered and thus is simplified. On the other side "msg = subscription.recv()" just returns a message if there is anything on the queue, and I guess it blocks until there is anything, because otherwise the handler does not have anything to do. Each side is single threaded, simple and fast and does not need to care about messaging issues or errors.
Ok Dave Your Store and Warehouse in the example have to deal only with business logic, messages are just coming and being sent. But just see You have done a delegation, You have silently indroduced a reliable message delivery (sub)system aka 'Kafka' nowadays. As usual today developers are putting great efforts into feeding and consuming the kafka. Even talking about 'kafka messages' what is a nonsense, since kafka is nothing else but a log, ledger. With kafka a delivery strategy coming into my mind: At-least once, At-most once and Exactly Once. Frankly a typical project manager would like 'exactly once' but that wish may break for example responsiveness. Take my word: the problem is real, restructuring the architecture means that it will be seen another part of the system: in your case at the kafka part.
I recently learned that a key "feature" of Agile is that it only supports the Project Management of the Features, Program Increments and Sprints. It scope doesn't include the Project Management efforts that decide what should be built for the customers. I'm confused about how this can be successful.
I think you are referencing to his work in general, not this specific video? Agile is a way of working. Its cyclic nature is mainly fit to structure work that has medium to heavy uncertainty. Software development is uncertain by nature, as he points out in his 'estimates' video. But you still should consider if this Agile way of working is the right tool for your job and if it is an organisational fit. With that out of the way. Agile development often uses less rigorous up front specification of the 'how'. That's a main difference with 'old' project management. But the 'what' is often better assessed then he 'old' way by using a combination of best practices and evidence based approach with short validation cycles.
Just the one, 😁 I’d like to remind you, this is exactly how relational databases work internally, and for that matter the processors that they run on top of too. This is not a new idea, and it is the most effective approach to high-performance systems used in all sorts of systems.
@@ContinuousDelivery I must be misunderstanding something. If it replays everything that ever happened since the beginning of time, then every restart would take longer and longer than previous ones.
@@michaelrstover Event Sourcing has the concept of snapshots. I.e. we occasionally inject snapshots in the sequence of events which represents the state up till that point in time. When we are replaying we only need to go back to the latest snapshot but if we want to we can ignore the snapshots and go all the way back to the beginning. To avoid the same book being ordered multiple times we make sure that all actions are idempotent, which is hard, but doable.
@@ContinuousDelivery Ah right! And do you post them to some kind of central queue that every other part of the system listens to, or is there some kind of other way to subscribe to different kinds of messages? What are the "design patterns" for this? Are these also described in the book?
This is experience talking.
Now I've been a coding for quite a while, this is one of the channels that promotes good development thinking rather than pointless ninja tachtics in code.
Amazing content, thank you for the great channel!
When I watch these vids I realize how much thinking the process takes to create good software 🙂
Thanks, the thing I love most about SW dev is finding those simpler ideas hidden in the complexity of the problems that face us. 😁 😎
Pointless ninja tactics in code! Absolutely! I'm definitely in the Dave Farley/Jeet Koon Do/Aikido camp, but am merely a n00b white belt.
So true. You can tell inexperienced developers by how unnecessarily clever they try to be (ninja tactics, as you say). Code should always be plainly obvious about what it's trying to do, and do it in the simplest way possible the achieve the outcomes. It should also not make excessive use of shortcuts or abbreviations.
I’m a big fan of this style of architecture and have been using it for 30 years. I think a key point that needs emphasizing is that you must get the partitioning right. It is not always obvious how a problem should be partitioned. As software evolves unanticipated dependencies can emerge. This can lead to the desire to couple or more tightly control the sequence of various message processing entities if you are not strict about sticking to the architecture.
Yes, where you draw the boundaries matters a lot. I think that DDD is a good guide, with its bounded contexts, but you still need to think about that part carefully. But then that is true of any complex system.
DDD is good for the overall design. Some architectural patterns can also help, notably Saga and Process Manager.
Astronomically Compromised District will be the name of my new band :)
Souinds like "Prog-Rock' to me, I'm in!
I like it! If you need a lead guitarist let me know.
What a great Name for a Band or a LP😉⚡
I got a chuckle out of that.
Just one more word. Control, Clan, Canal, anything that starts with a C.
Not many people will be ready for reactive systems to be mainstream, but it's one amazing future we should all embrace!
I agree, there are some interesting ideas being talked about as “Stateful Serverless” systems that I think could be the basis for a more approachable dev experience. At the moment it is a bit of a minority sport, but I think it has MUCH broader applicability.
Have been watching you for months and did not realize you're one of the people who authored reactive manifesto!
Same!
It's so crazy that this is what appeared in my feed today. I was in a meeting Friday where an application went down because what should have been an asynchronous event ended up needing a synchronous response to be listed as good. it is amazing to me how more systems aren't message-driven given how scalable they are.
I've been thinking a lot about state management for front end applications. The end result of my considerations was that Observable pattern really is the absolute ideal way to handle state management. You're describing reactive programming, aka functional reactive programming aka Observable Pattern which is already in use with RxJs, Vue.js and Angular. The only downside I can see with this pattern however is that our modern programming languages seem to be not well adapted for it yet. For example, the setup code for a message pipeline / system should be compile time generated and typed, but instead it is still being executed at runtime and untyped. The performance overhead here is usually insignificant, but the ergonomics (readability and also writing the code) could definitely still be improved.
Two questions about the approach you describe on 14:00, where the order state is only kept in memory and the service replays the messages on restart to recreate this state:
1) How do you keep startup times down if you have millions of old messages to run through?
2) How do you deal with breaking changes to the public contract or the service. Do you have to migrate all the old messages, do you write som special legacy handlers in the code, or something different?
In event-sourced systems, question 1 is usually addressed with snapshots of the system taken at a certain interval, which themselves can be stored as a special type of event. The complete history of events is persisted as a backup.
Problem 2 is more complex and a notorious downside of that type of system. Greg Young has a great talk about the topic if you're interested.
Please consider adding more info or thoughts about these. Consider advantages, disadvantages, use cases where they would be better than other paradigms etc.
This can be useful within a single process too. I recently worked with a product that was split up in several components (domains) that each had their own main thread. They subscribed to each other's events and called into each other at will. With a subscription you could choose what thread you wanted to be called on. It was a total mess: side effects could call back into the calling component without you being aware of it, causing hard to track down deadlocks.
Unfortunately no one recognized this as a problem worth addressing. I no longer work there and with some distance to the matter am now creating a framework/library that fixes the issues I had to deal with when I was working there. The concepts/principles presented in the video are identical to what I've come up with so far.
My foray into Angular resulted in apathy towards the framework but a love for RXJS and reactive principles. While organizing a project with reactive principles in mind is my favorite, debugging can be quite the challenge with standard techniques. It would be great if libraries had a debug mode that tracked messages in the system.
The best practical explanation of a reactive system I have seen so far!
I've very happily used this architecture approach a few times but it should be pointed that there are dragons to be aware of:
- deep message/event chains will produce a system that is hard to understand/reason about the flow of logic
- event chains where a failure (expected or not) requires actions based on previous events in the chain to be undone are hard to achieve and can get messy **
- teams tend to treat messages as a contract which leads to coupling. It can also lead to domain leaking as teams tend to send events that are actually commands for a specific service to react to, rather than emit events that say "this happened"
** Saga pattern or orchestrated events over choreographed can help with this...
Just like any architecture approach this is not a one size fits all
Excellent Presentation!
I have been building reactive systems in Akka/Scala for years, and you have really captured much of the essence of what I have experienced better than I could imagine. You have also alluded to several other important aspects, and I hope you will expand on some of this in future presentations. Because of this presentation, I have now subscribed to the channel.
Thanks 😊
Thanks for the video! Do you think you could make a video about the trade-offs of this approach?
It looks to me like the robustness of this design doesn't come for free? Or does it?
Excellent presentation. I’ve been designing software like this for at least 20 years. Thanks for giving me a label for the approach, that I can put in a CV.
Brilliant ! So without naming them, I believe the concepts touched are reactive microservices, Event Sourcing ( and CQRS ?) ? Why do the services themselves need to be single threaded? A reactive multithreaded asynchronous & non-blocking model (Rx, Reactor, Akka) would work, in fact would be great ?
Is is correct to say that Reactive Programming and Reactive Systems are analogous with scale being the factor ? Both are based on async non-blocking message passing ?
Yeah in other video's he is more clear about being opiniated. And I have seen this combination being used in multiple systems: really powerfully but a framework on its own (hard to understand/learn).
That single threaded thing reminds me of the 'Actor' model. It is for consistency purpose so you don't need locking in theory, but in practice the partitioning will make you still need some consistency control but then eventual consistency will be enough. And the actor is managed, like he said, that will allow multiple messages to be processed at the same time. Also it is a combination of data with logic so you are fast, another thing he hinted at.
brilliant learning. I thoroughly enjoy how you explain concepts
A great explanation!
Would it be possible to delve into things like:
1. What happens when we need different versions of the same service to exist, and how we go about building them? For eg. in the context of a pub-sub (reactive) v/s a REST system.
2. Can we fit all use cases in the reactive systems, or are certain cases better handled by non reactive designs?
Explanation is super clear, though one topic is still a bit vague to me. Who is responsible for message streams? Should we pick classic OO style where the one who emits message tells to whom it goes? Or we pick Rx way where emitter sends message in some medium from where it can be retrieved by anyone who wants? Alternatively we can create some intermediate "telephone switchboard" entity which tells who gets which messages. I've tried a bit first two methods and i have not found any serious flaws in any of them(so i am very suspicious). But i feel like this choice has great influence on general design and dependencies in particular. I would've applied rule "high level modules should not depend on low level modules" but with messages it's not clear how to define dependency
Thanks for this video. Can we have a detailed video wherein a reactive application is built using available frameworks/libs/toolkits like akka, rx etc?
I am not sure that this kind of more detailed tutorial fits my channel very well. It is probably only of interest to a smaller group of people and most of my videos are under 20 minutes, and you can't get very far with anything other than simples examples in 20 minutes. It would be fun though.
@@ContinuousDelivery one possibility you could consider is providing a sample by github, and in a video, cover the sample on a high level, covering the system structure and using the sample as a launching point for a more conceptual discussion.
So reactive systems are basically event driven design? What kind of messages are you thinking of? If the responding messages are async they probably need to make a connection, like sockets or xmpp or what have you. Do I understand that correctly? I’m intrigued and would love to know more about how the messaging parts actually work.
You need to define your messages carefully for this to work. It's should be signals about state transitions, not a command system with a request/response pattern.
Near the end, you gave the example of having multiple stores if there were too many messages coming in. I'm confused about how that clustering would work. If you send all of the messages to both stores then the clustering doesn't gain you anything because both stores will be overwhelmed. However, if you send some of the orders to the first store and the rest of the orders to the second store, how do you subsequently know which store to send a message to to retrieve details about a specific order?
In the last example it is not "Clustering" it is "Sharding" there is some external process that allocates orders to an instance of a store to process it. There are a variety of strategies for allocating to a shard, but it is complex and does add overhead. You are really getting back to the SEDA example earlier in the video, except you are doing it at the level of messages rather than threads, which is a bit more cost-effective.
Every time a cornerstone was mentioned, I thought: Thank god Kafka handles those problems! 🙌🙌
I went down the Erlang/Elixir rabbit hole after your OO vs. Functional video, and I think it's pretty cool stuff. Glad to see more from you about these ideas.
Thanks, there is some really nice stuff here, I am pleased if I managed to help you find something interesting.
Regardless whether one agrees or disagrees with the methodology, this is very informative.
Thank you. Really like the way explain it. I understand this is an introduction, it would be useful to have some implementation examples. This looks like message queue architecture what's the difference? How is scalability and reactivity implemented?
There is some level of 'supervision' that I didn't talk about in this video. Each services generates 'back-pressure' when it is under stress, up-stream parts of the system notice the back pressure and can then react to it and add more, or reduce if the load is low, resources.
More more more architecture videos please! Love your stuff!
This is such a goldmine, all of your videos. Thanks for producing this content and making it available.
Ps, an message about an order for a particular book is now circling around there somewhere ;)
Hi Dave, that was great! In trying to understand general concepts of software, are reactive systems the foundation for "low code" or no code platforms being created today?
No, mostly "Low-Code" is a reboot of an idea that crops up every few years. "Visual Programming" there have been lots of attempts to raise the level of abstraction through visual programming. These approaches are usually very good at simple problems for a narrow set of problems, but quickly become problematic as the problem gets more difficult. I confess that I haven't paid much attention to the current crop of "Low-code" systems, they don't really feature in the kind of systems that I am involved with, and given previous false-starts, I am something of a skeptic.
On a system that gets high levels of traffic does the startup time of a service not eventually become prohibitively time consuming with the replay method for getting from blank slate to current state at startup?
Excellent discussion, thank you. Would love to hear more about the required characteristics and considerations of the underlying messaging system
Check-out my friend Martin talking about Aeron ruclips.net/video/tM4YskS94b0/видео.html
or me and Martin talking about some aspects of our version of this a few years ago: www.infoq.com/presentations/LMAX-Disruptor-100K-TPS-at-Less-than-1ms-Latency/
I have a few questions: just how complex are the actors allowed to be? How does the messaging system work at scale across the network and how does it know where to go to resolve addresses of actors? I've worked with event based systems in the past and I've struggled to debug them as the bugs tended to be on the big picture integration level, how do you combat and debug this?
Thank you so much for all your content, it really is the right stuff to hear
this is like OO, where each microservice or component is like an object, and they send messages. They also have a buffer of those messages, and message sends are transactions, much like database transactions, making sure the other side received it, or undoing the message send in order to put it back up in the queue of messages to send once the receiver can properly receive it.
Terms change when talking about different levels of computers and software, but the concepts remain the same.
Enjoyed the video! Everything just became more clean and simple and I noticed a lot of patterns behind our daily work. Thank you. 😊
I'm just learning about this now. It's sublime how decades old simple ideas resurface in an (r)evolutionary way, making software development faster and easier. From Smalltalk to distributed reactive systems, we've come a very long way. 😊
Yes, the self-similarity of SW at different resolutions of detail is interesting. When we were doing the initial thinking about building the LMAX Disruptor, some tech that was used to implement the infrastructure for our reactive system (we open sourced it, so you can see it here: lmax-exchange.github.io/disruptor/) we were at the stage of discussing ideas around a white board. This was at the level of lock-free programming and optimising on-processor cache usage, when one of our colleagues, a very good DBA, cam into the room. We said, what do you think about this Andy, and explained what we were thinking of, and he said "Oh yeah, we do the same thing to manage caching in relational DBs" 🤣🤣
@@ContinuousDelivery Cool! 😊 I think I know how we can fix the knowledge sharing problem here and how we might stop reinventing the wheel. We need to look for a "software engineering theory of everything!" Just like physicist!
@@PaulSebastianM Well I attempt to give my solution to the "SW engineering theory of everything" in my book "Modern SW Engineering" 😁😉
@@ContinuousDelivery haha, interesting! I haven't finished it yet. But I'll be back to reading it soon. 😁 just went quite busy at work soon after I got it.
Thanks for the great content. QQ: Is it the same as event driven architecture?
There is more to it than event driven architecture, but they share some ideas.
Hello, I have never worked on a reactive system, but I am wondering what to do in a case when you want to change a message (sent from service A to service B) to a new version of the message? Do you migrate old messages to the new format in that case (probably not), or do you keep supporting the first version of the message forever for the sake of replayabiliy?
It is one of the more difficult parts. There are largely 2 choices, you support old versions of the messages, or you snapshot the state of the service, and start recording new messages from the new baseline of the snap-shotted state.
@@ContinuousDelivery Thank you for the answer, much appreciated
This kind of system is incredibly complex to debug especially if it is multi threaded. If for whatever reason a message is not dispatched and there is lots of messaging between many different message boundaries this architecture can be a nightmare to troubleshoot. Process correlation in the logging would help of course. Another problem with this type of design is finding developers with enough proficiency to work with it. That being said, I love how this architecture scales and encapsulates boundaries of knowledge.
What about event ordering - if I try to compare it a bit to TCP packets and UDP packets, it makes me wonder how to avoid, detect and/or correct - events which fell out of order?!
Yes, events are ordered
When you store (and later replay) the messages for a service to get it in an up-to-date state, do you record and replay the messages in the order by which they were received? Since you potentially might be dealing with a distributed system, messages might be reordered if they originate from multiple sources. Replaying the messages in the wrong order might yield an incorrect state.
Yes, not necessarily globally, there are some techniques to limit what you store, and keep determinism. "Command Sourcing" instead of "Event Sourcing" for example. I had a goo chat with Frank Yu, who is building these sorts of systems recently, you can see it here: ruclips.net/video/KG6bPVWBl5g/видео.html
Wait. This is just microservices but within the code itself. It also makes a lot of sense that knowing where to draw the boundaries matters a lot, because it is that way with microservices as well.
This sounds similar to the Tandem Non-Stop architecture?
I am new to learning programming, and I have found your videos to be extremely useful in my progression. Thank you for making high quality content like this.
In this example, how do we add more warehouses? More specifically how can we route the messages to different warehouses such that only one of them respond? What if for some reason we need to create unique id messages in response to orders, how can we maintain consensus?
You can shard the message topics, or filter on receipt at the warehouse.
PlaceOrder is a, usually synchronous, Command. OrderPlaced is an asynchronous Domain Event. I view Domain Events as immutable facts that occurred in the past. They are often data. So for OrderPlaced, it would include data, such as the order number, the item ordered, the price, the buyer's information, etc.
Two nice follow up videos to this video could be Event Storming and Event Sourcing.
One of the things that these kinds of systems have made me think, is ghat there is no real synchrony. It us a fake that is layered on the more fundamental async transmission of information. I think this idea actually goes quite deep. Why does “PlaceOrder” have to be sync, when it can be faster, more resilient, more scalable and more robust if it isn’t. Actually at the level of Physics sync is a very questionable idea - funky stuff, I know 😁😎
I hope this channel has videos that shows more code than talking. First half of the video talks about the concept and the second half shows code. Regardless this is a good video.
I actually like that this channel focuses on the conceptual side. The conceptual is the foundation of implementation; however, not many people seem to give the conceptual side adequate attention or respect. It is easy to find code examples and tutorials on RUclips, but not as easy to find deeper conceptual videos that are genuinely helpful. I think perhaps a video in which a link to a sample is provided along with a high level overview might fit into the domain of this channel better than a coding video.
How does the replaying of message to get the "current" state work when the project is running for a long time? I can image there ware already multiple versions of both the messages and the service? Isn't such a replay a slow operation? You need to read out all the messages (or best case only the messages handled by the service) from somewhere. Also could be that a service started handling an extra message at some point in time, would such replay yield a different result than?
Most of these systems usually take periodic 'snapshots' to reset a new start-point and then replay from there. Most of my experience of systems like this is from financial trading systems, so VERY VERY FAST, so replaying was mostly a non-event. I seem to remember that in the excahnge that we built at LMAX on startup the 'big services' would take a couple of minutes to replay a week's messages. This despite the fact that we occasionally hit message rates, for very short periods, of the equivalent of around 1 million per second.
You are right the tricky bit is migrating to new versions of messages, but that is only the same problem as any other data migration problem really.
@@ContinuousDelivery Thank you for the answer, especially for the real world use case.
Indeed from the very nature of the reactive services the replay is fast. Snapshot indeed can speed it up, only it adds extra complexity, because you need to snapshot each service entity separately (it has own memory) and you need to restore restore same amount of services back up.
I find the data migration of messages harder than normal migration of databases, because there is extra code executing dimension to it (when replay) where in database migration there is the data changing dimension.
One issue which I was still wondering about. When the service goes down unexpectedly, how to we know how far the replay should be run and from where the service should start processing the messages normally and sent it's own to network? Are there some good patterns for it?
@@szymonlukaszczyk I agree with you that it seems trickier, in reality though there is really no difference, I think that the reason that it seems harder is just that we are more familiar with DBs. If I have an Address in a DB and I want to add a Postcode field, I have the same set of problems as if I have an Address in a message that I want to enhance. I need to populate old Address that pre-dated this change with some default value for the Address. So all we are talking about is the mechanics of how we do it really, the problem itself is the same.
The restart process itself is pretty clear, you need to load the snapshot, then replay all post-snapshot messages to re-establish the state. The only mlidly tricky part is that you want to allow new messages that arrive during the restart to be added to the message log while you are replaying so that you don't miss anything.
This is all problems for the infrastructure though. At LMAX we built this stuff on top of some software that we released as OSS "The Disruptor" lmax-exchange.github.io/disruptor/
@@ContinuousDelivery Thanks I'll check the white paper from Disruptor, the idea of lock-free message queue sounds really appealing.
What a great video explaining how ecommerce system work. The only question I have is what is a reactive system?
There is a definition of "Reactive Systems" at 02:57 they are reactive because they can more effectively respond to change at runtime.
I totally agree that these systems is much more easy to develop, maintain and to extend (yet bit technically complicated). I remember The Fear of pealing off layers of onion just to get to its core to implement some simple new functionality in some very old monolitic system. The very next day you started your monolith... the business requirements will change!
I seriously cannot imagine why would anyone deliberately click on the dislike button. Perhaps 4 people were in a rush and their hands slipped and missed the like button?
Yes, more please! Would you say this is the way that microservices would work? Each MS is an actor with its set of messages in progress? Good separation of concerns appears to be critical.
It is my preferred approach to MS. Have you seen this recent video on "Actor"? ruclips.net/video/-IZlOPciSl0/видео.html
@@ContinuousDelivery Ok, cool. Yes, I watched that video before this one 😀
Another fantastic presentation, I'm working on a chatbot, where everything seems like it needs to be synchronous, so this helps with unpicking those assumptions.
These videos are also great for starting conversations with team members, giving us all a shared understanding. Often, I think I know enough about a topic that you cover, but even if I do (and I don't always) I wouldn't be able to articulate it so well.
Thanks
Simply Great !
I would like to hear more details about "entrails" of such systems. For, example if its better to use Queues like Kafka or Rabbit, or there is exist more interesting ways
The capability of software or more generally, computers to respond gracefully has been widely available for decades. For example, someone writes a function to calculate the sine of an input argument. After writing the function to handle all numbers, real and complex. Next, the developer considers how the function should respond to non-numbers such as NaN, Inf and text.
There are many options on how to handle these non-numbers.
My experience is that sound software development includes consideration of these issues and it has since at least 60 years.
Regarding forgoing a normal state-based database (with different backup strategies) in favour of a message log that records all messages in perpetuity, have there been any studies in spacial and computational resources required by these two strategies? From an enviromental standpoint I could potentially see an issue in having a system that cannot take a stateful db as starting point after a reboot, but instead have to rely on replaying every single event that's ever been recorded - a list that will continue to grow and only ever grow, requiring ever more storage and computation to replay.
Actually, for most modern relational databases, this is how they work internally anyway. So there is no difference in these models at that level, This approach inverts some of the relationships between the steps. In these sorts of event-based models the ideas to replay and how you manage them over time are well understood. The commonest pattern is regular 'snapshots' where, effectively you re-base the point from which you need to replay things. We used to rebase our event streams on a weekly basis.
On the whole, my experience with this is that this approach requires quite dramatically fewer resources than others. Our financial exchange that was built this way processed 1.3 x the daily volume of Twitter on basically two computers. Storage did grow fast though, but it was stored in a more compact form that it would have been if it had been broken out into DB records. So overall this is a more reliable, more efficient approach and does ask us to generate more CO2 to achieve it.
@@ContinuousDelivery How do you "rebase" the event stream? Would that not require the system to be able to start from a stateful snapshot, rather than from the beginning of the complete event timeline, which is how I understood this point in your video.
very nicely explained indeed 👏
Thank you! 🙂
Not sure was it discussed there in this channel... but it would be interesting to hear about such thing like... "Errors is features!". If one of the downstream services fails... it's not an error. It's a unimplemented feature.
How do we differentiate event driven and reactive?
I don't have a lot of experience with this but this is my interpretation. "Reactiveness" is not boolean, you can have a system more or less reactive. In practice, it seems to mean using techniques that allow systems to scale themselves to demand. Message queues paired with automatic horizontal scaling make the system more reactive. The technique that is most usually associated with reactiveness is reactive streams. These are described in the video.
Hi nice video. One question to the corner scenario at 9:03.
How the store know that the book is not available and reject order?
How to do that without any wait for response or periodic msg from warehouse?
Depends very much on the bussiness logic.
1.Do you want to automatically reject orders if the warehouse doesn't have it? Send a message from warehouse that the store listen to: item(id#223) out of stock and the store react.
2. Do you want to wait for an amount of time before you cancel it? Do you expect that the werehouse will dispatch the book in 10 days, listen for a specific message if not. Just change it to rejected.
Would love a discussion on event-driven vs message-driven - I think the manifesto should be more opinionated here
How do you replay events through an evolving system? For example, some subset of messages are processed by api v1, others by v2, etc. How do you handle the bookkeeping to 'recreate' reality? Is there a system that tracks this kind of meta data?
There are a few different strategies. In reality, for this kind of system, you don’t keep all of the events forever, instead you “snapshot” some state, and then replay events after the last snapshot. So, for an orderly shut-down, you can snapshot, and then migrate the snapshot as part of the deployment of the new version, that is what we did for our exchange.
Greg Young has written the book "Versioning in an Event Sourced System" about this topic.
@@ContinuousDelivery Prevayler would like have a word with you about log-structured object systems... ;-)
@@thewiirocks Yes, I never used it but I think it was based on very simlar ideas. We don't claim that any of the work in the Reactive Manifesto, or the LMAX architecture, that my version is based on, was original, but we added some nice features that I think took it further than Prevayler (but I could be wrong I don't know it well).
@@ContinuousDelivery I’m mostly teasing. Prevayler was the last gasp for Object Systems as databases from the early 2000s. We really loved OOP back then and kept trying to eliminate all the yucky SQL. I weep when I realize my colleagues and I unwittingly unleashed the evils of Object Relational Mapping (ORM) in our mad quest to eliminate relational databases in favor of Object Systems.
Prevayler solved all the problems and brought such a system come to life. And it was awful. Rather than manipulating the objects directly, we had to create serializable “command” objects that were responsible for updating the object graph.
These commands are what get stored in the log structure and replayed on startup. Like any good log-structured database, it snapshots the object graph and flushes the log at regular intervals.
It all works great. But after writing the fifth or sixth transactional command object, it gets pretty clear that the time investment isn’t worth it. It’s faster to use regular database commands where the data elements are easily identifiable and comparable.
more of these please, dive in.
Cute. I mean, there's nothing wrong with anything said here. It's just that it completely ignores all the complexity and problems introduced by this type of architecture. Telling developers that this is "actually simpler" is quite misleading - a lot of devs might venture into this after watching, and all the ugly questions will start to arise. This type of architecture may be simpler for some systems that require extreme scalability, and no doubt Netflix, Amazon and Google need this, and it works for them. But in reality, you trade off one type of problems for another - infrastructure, deployments and versioning, refactoring etc. are all vastly more complex at scale... A video like this needs to prepare people honestly for the new complexity they're facing. It's not for everyone. And it's certainly not appropriate for every project.
Well, there is only so much that you can say in a 17 minute video. So yes it doesn't talk about everything.
I am interested in where you think the extra complexity lies? It is certainly a different way of thinking about systems, but I genuinely don't think that it adds any complexity that isn't already there, it may surface it and make it more obvious to people, but I don't think it adds any. In addition to that, it does allow you to ignore stuff that otherwise infects nearly every aspect of normal system design, for systems at any scale.
As I said in the video, I recognise that it is a subjective, contentious, statement to say that this is easier it is a balance of trade-offs, but I think that overall for systems of any reasonable complexity, this is easier.
@@ContinuousDelivery I don't think the choice has much to do with the complexity of the system you're building. It's more a matter of scale. If you need to scale to extreme workloads, or extremely big teams, it is probably worthwhile. But a system can be very large or complex without facing any of the problems of scale that this type of architecture was designed to address. Frankly, I don't even know how you can ask where the extra complexity lies? When you choose patterns that require infrastructure, of course you add operational overhead. Service topologies, on their own, is a huge subject. When you add network dependency to all your systems, of course that adds concerns about reliability, error handling, timing, delays, resiliency, etc. etc... You draw these neat little diagrams, but unlike a simple function-call, every one of those arrows implies all manner of complexity and concerns about all of these things. Choosing this type of architecture should be a really well informed decision about a deeper investment in high availability and system/team scalability, because it comes at a premium compared to a simple monolith. Like, for one, you need manpower, and not many people have enough (successful) experience, or the expertise to build something like this that's also maintainable and stable. Far too many teams have ventured into microservice architecture cluelessly already, one disaster after another - there are countless articles about the failures in this area. Look. I'm not knocking it. It has it's purpose. But to sell this like it's going to be easy is extremely misleading. If you anticipate problems of scale, then yes, this will be relatively easier than trying to optimize a traditional monolith. But if you don't have those problems, it's just layers of meaningless complexity. Message based architecture of any sort is not a cure-all for anything. "Right tool for the job."
@@ContinuousDelivery ... with all that said, I'm following with some excitement projects like Ballarina and Motoko and a few others who are attempting to abstract away most of that complexity at the language level. If they succeed, the complexity argument might start to fade. Until then, these systems have to be built in languages that really (really) weren't built for anything this network dependent - which currently have to solve problems with libraries and frameworks that substantially complicate, well, everything, while adding the burden of infrastructure management. This could change! I would love to see that happen. But we're some years away from this type of architecture being something you reasonably can or should choose for projects that don't anticipate big problems of scale. In my opinion.
@@RasmusSchultz Fair enough, I began the video by talking about larger scale systems. Sure, if you are writing a CRUD web-app for you Mum's cake shop this is probably the wrong choice. However, as soon as you add anything beyond the most trivial need for distribution, I think that this approach shines. My background is in building distributed systems, for many years, and so that is what I think of. I would argue that any form of distribution adds significant complexity. This approach manages that, and the very complex failure modes inherent in distributed systems, better than anything else that I know.
I don't think that this is only for "highly available" systems either. I think that one of the major benefits, is that you can create a local, simple, application from a small collection of modules/services/actors (whatever you call them) and it will be easy to program and easy to test. This same app can then be distributed without needing to change the code of the services themselves. Sure, you will need to do more work, thinking about sharding and clustering and so on, but the code itself doesn't need to change.
The exchange that my team built this way ran on a network of over 100 nodes in two data-centers, but the whole system could also run on my laptop.
@@RasmusSchultz AKKA is pretty mature and used in lots of real world projects. Aeron is built on ideas and technical approaches that we began at LMAX, Hydra that I talked about in a video about BDD (ruclips.net/video/9P5WG8CkPrQ/видео.html), is used for building trading systems very quickly and very simply.
I think that you are right that this isn't yet at the level of "Out of the box" development (though AKKA does that) but the reason I did this video was to 1) celebrate 30k signatiries for the Reactive Manifesto, and 2) because I think that this approach, more than any other that I know, has a chance to make dev of complex systems easier. It can allow less skilled/experienced programmers to focus on the problem in front of them without getting lost in the computer science and technical nerdery of complex distributed systems.
This is a big deal I think, the industry needs something to make building more complex systems a little easier. Microservices is designed to solve a different problem, but many inexperienced teams and developers mistake Microservices for (Services) and so think that Micorservices is all that you need for distributed systems, and it is nowhere near enough.
In the design of the architecture, was at some point pull-based messaging considered? This would have the advantage of not needing pushback when downstream modules become overwhelmed with messages.
...but twice the amount of messaging and so twice as slow, so no we didn't try that.
Awesome content and superb examples!
Excellent... I've been asking myself "what is all this reactive stuff 'they' keep going on about?" many times recently.
I'm a big fan of the concept of reactive programming. However, once you are looking for nails for this new hammer, you are usually directed straight to already-built systems like Kafka et al, but there seems to be a lack of "passing on the skill of building these from scratch" out there. I'd love to see a simple example of the Me, the Store, and the Warehouse in JavaScript (or other) using just vanilla features. I'm particularly interesting in seeing how the Store would rebuild it's state by a full replay, and what happens when that replay is millions of transactions long. Thanks in advance for any advice. This would be a pretty cool example to "pair" on if you were so inclined ;)
Some people say the system can have the latest state cached somewhere to serve requests after the list of events gets too big. Then the events are used in case you need a replay; otherwise, the cached state is used.
This reminds me of the pub/sub design pattern. Am I completely off on that? I've been reading Juvel Lowy's "Righting Software" recently.
Pub/Sub is certainly a good strategy, my preferred strategy, as part of the comms, but reactive systems go a fair way beyond Pub/Sub alone.
When you speak of reactive systems, do the services have to use single threaded architecture in order to be "truly reactive", even if they can scale horizontally on a platform like Kubernetes?
The architecture at large is very much not single threaded, but instead a large collection of mannnnnnny different single threads, each responsible for their own unique message queue. A large importance of this design is to ensure that the messages can be processed in the correct order (generally FIFO). The thread processing a single message queue is free to spin off / delegate any work to one or more async threads (or even other "actors" handling their own separate message queue) so long as any response to that async delegated work comes back into the same single threaded message queue. This is key for any bit of your system that needs to maintain some changing state that may impact how you respond to / process each new message that comes through. If your service isn't tracking any internal state then there's probably not any need for a single message queue, your risk of race conditions is almost nil without shared state. What this does is help limit things like race conditions and other problems that occur when dealing with shared state across multiple threads.
A reactive systems is a big state machine but in architectural level. A problem I have with this type of system is the traceability. It is really hard to reason about it or design it. Although, you can choose an orchestration approach over a choreography one. Nevertheless, they are hard to build and design. You need a very good team that knows what they are doing to choose this approach. The same can be said about microservice but people choose it anyway hahahaha.
You think that's good, you should try a changelog-based data processing system. i.e. Each node tracks the records that change and downstream nodes update themselves based on the upstream changes. That means that data can be safely removed from a system just by removing a file at the "edge" and letting the giant state machine catch up. All the combinations and aggregations will get redone, leading to a consistent state of safe and provable deletion.
Of course, my system kept a full list of change keys from all the upstream records, so I could truly prove a record's deletion. And yeah, this isn't easy for most engineers to wrap their heads around. But I've noticed over the years that they manage to wrap their heads around whatever is popular at the moment. Even if it makes NO BLOODY SENSE. (I'm looking at you, Backbone.js. And all your descendants!)
True about microservice. People are using it without even understanding it. In one of my recent project people were having shared databases.
@@thewiirocks hahahahaha. True! People get used to basically anything. But most of the time, IMHO, I think people can work on something complex even though they do not understand it properly. It takes time and experience to get the full picture of the thing. Wow changelog-based... I've never heard that one. I'll take some time to process what you typed. Do you have any article about it? It seems cool to know about it.
@@thewiirocks hahaha, same here. When people say to me they are working with Microservice I always ask what is their definition of Microservice. Most of the time is not even distributed... OMG!
Thank you for this great presentation , and thank everyone here for the great comments.
Let me get this straight: you want to create a book store on top of a differential log for state? (no ACID database). So when the system restarts the book store recreates the order state by processing the billions of lines in the log file storing the current state in memory? (e.g. Amazon). I find this incomprehensible.
That was my thought as well. It seems like that would be much slower than a snapshot style recovery mechanism. Maybe the two could be used hand in hand? Like a snapshot every few hours, and use the logs to fill in the state change that occurred in between the snapshots?
This is essentially Event Sourcing or CQRS. You would store snapshots and play forward from the most recent valid snapshot if your application required this.
I may have found a gold mine. I'm not sure yet but I will keep digging..
How do you keep an overview of messages? As in who produces message X and who consumes X. It is easier when it is a mono code repository but I imagine with a high number of smaller code repositories it will be difficult to keep/get an overview?
Yes, keep it in one repo is the simplest solution. When we built our exchange this way, we kept the whole enterprise system in one repo, and had a single Deployment Pipeline that evaluated everything.
@@ContinuousDelivery thanks for answering!
Hey this was a great talk! Do you have any go to resources for learning about event driven architecture?
There a few links in the description of the video.
What good reactive architecture book do you recommend?
There are some links in the video description, probably this one: Reactive Design Patters, by Roland Kuhn & Jamie Allen ➡️ amzn.to/3uCqNph
Isn't async messaging called "Event-Driven Architecture", and replaying events to get the latest state "Event Sourcing"?
Event-driven doesn’t necessarily mean asynch, it also doesn’t focus on the other ideas of responsiveness, resilience and elasticity. There is more to the Reactive Manifesto than only asynch, but it forms the foundations for the other things.
Technically Farley's describing command sourcing, but it's a similar idea.
Good stuff for me. More please.
How is this different from event driven architecture?
I would characterize it as taking event driven to its logical conclusion (being event driven "all the way down").
This is certainly an event-driven approach, but that is rather like saying that it is a code-based approach too. There is more to this than "event-driven" implies. None of it is new, but it is assembled in interesting ways, and has some very nice characteristics as a result.
Thank you for the reply. I suppose my confusion was because I always thought of event driven as containing the things you mentioned as reactive systems.
I do wonder how such a system would handle requests like "give me the average temperature of the last 30 days". In order to have that kind of data, action needs to be taken *before* request comes in.
How would this be modeled in Reactive Systems?
You collect the stream of samples with one reactive service and have a separate service to handle queries of history like your example - CQRS
Given this video and the BDD videos, I think it would be very interesting to see a practical example of how to use BDD/TDD to test a reactive system. I am curious where one would test with which kind of test (and why). These techniques look like a good way towards scalable high performance systems, but a practical example would help me a lot in getting started.
Try these: ruclips.net/video/bHKHdp4H-8w/видео.html ruclips.net/video/9P5WG8CkPrQ/видео.html
@@ContinuousDelivery awesome! Thanks!
Great video! Maybe a bit lacking of counterexamples
For me it looks like reinventing the wheel:
Unix uses pipelines since ever (i.e. roughly around 50 years!), and that's exactly what this "modern" kind of software does: waiting for messages coming from stdin, processing the data a bit back and forth, and sending the output to stdout where another piece of code is reading from stdin ...
Oh yes, and one more thing: does anybody recall the term "SSR" (sub-second response)? That means that a software must show a response in less than a second to be viewed as "fluent". IMO this can only be achieved by using black boxes which are specialised on performing just a single (or at max a few) task(s) which are listening to messages and answering by messages.
But one thing is very important: the messaging system itself is the bottleneck. I can remember a situation where the Berlin emergency management system was overloaded with messages at new years eve, because every incoming phone call automatically created a message which couldn't become handled by the back office. So people didn't receive an answer in due time, and hung up just to redial the emergency number again - which then created a new message while the first one still remained in the system waiting to become processed ...
Of course this shows bad system design, but shit will happen ;-)
Yes, but not really. It is certainly an analogy for similar approaches, not to say this stuff is new, the Transputer from the 1980's a massively parallel CPU where the nodes communicated by internal asynchronous 'messages' on a chip is a more direct antecedent.
Your Unix analogy breaks down at a number of places in comparison, the main one being that the messages aren't async, and you example at the end is cured by adding hysteresis to the propagation of errors, and back-pressure to messages to limit inputs when under-stress, function of the messaging in Reactive Systems.
As an analogy it is reasonable though, and in part that is because we are talking about some fairly fundamental ideas, like comms and the separation of the messaging (pipes in Unix) from discrete processing units.
Oh, and one more thing, in Reactive Systems we are usually talking about message latencies of micro-seconds. The fastest messaging system in the world Aeron was built for these kinds of system.
Somehow I've found my way here. It's a long story lol. I find the concepts of this video interesting but I wonder, doesn't storing messages like this lead to holding on to more data than necessary? Or are there patterns for clearing fully processed messages from your buffer that no longer have any impact on the state of your store?
Yes, there are other patterns to help. Most systems that work like this have some form of "snapshot" mechanism to update the base-state. You store and process a sequence of events. At some time you establish a new "snapshot" that represents the complete current state of the system, then you can start from there and only need to replay new events that happened after the snapshot was taken.
Great video as usual, thanks!
Have you worked with either Elixir or Erlang, the system that you describe here sounds really close to the OTP framework introduced in Erlang!
Akka is build on concept of Erlang actors.
Yes they are essentially the same, or at least very similar, concepts.
Seems this is very much related to "Event Sourcing"
The only thing to worry about here is how to explain those brilliant ideas to the team that is used to the synchronous way of thinking...
These systems are harder to design, easier to develop
Im starting to worry because it seems like you have all the answers O_o
🤣😇🤪
Continuous Divination 😬
Once you go reactive you never want to go back
Yes, it is a very addictive way to build systems
You keep delivering continuously very valuable content! Thanks for that! I'd love to see you some day upgrading the camera and the green screen lighting, and by doing that gaining more lively image quality. Or maybe even a real background and a TV could work very well for you as well. But by no means I'm trying to imply that with the current setup it would have significantly negative effect on the content. These are just details I - for some reason - tend to usually catch about productions. Anyhow - Great work and I'll sure be watching your upcoming videos!
“Replay all the messages in order to restore the state” sounds simple. But in this, possibly irrelevant example, what does it means? Re do all the book orders messages that ever happened? Who stores all the history? Why not backup/restore the state it self (as done in today systems)?
Yes, you replay all the orders, but during replay, one option is to disconnect the bits of the system that act on the orders, so you get a perfect reproduction of the state of the system, but don't re-do the shipping of orders, for example.
The advantage of this strategy over the more conventional "state snapshot" approach of storing a static picture of the state at some moment in time, is that in reactive, event-based, systems like this you don't loose information. Ion a DB, you loose the time dimension, unless you actively design the DB to store it, and relational DBs in particular are pretty poor at keeping time-series data. In an event based system every event that ever changes the system is kept, in-order, so that you can rewind, or fast-forward through time. For example. You can replay all of the events from production until you hit a bug that caused a problem.
One final thought, if you are distrustful of this strategy, this is how relational DBs work internally, it is just that they don't keep the events once they have acted on them, and so loose the time dimension that describe HOW the data got to the state that it is in.
@@ContinuousDelivery it sounds intresting but it kinds creates more questions than it provides answers. The messages must be all stored implies ever growing data to be stored (much bigger than a snapshot). The rewind process must be computationaly expensive, might it happen that a rewind of all the years long message history could lead to unacceptable long down times or degradations? And what is halfway through history you change the version of your system making it incompatible with previous messages?
I guess i must fork a nice example and do several pocs before i can partially comprehend the implication of using this type of architecture.
@@tigoes There are solutions to all of these. It is common to not replay all messages from the dawn of history, and to periodically "snapshot" so limiting how far you can rewind for example, but the storage can be *instead* not *as well* as conventional DBs. As I said, this is how RDBMS works, Reactive systems just apply this at the app level, rather than at the infrastructure level, and gain lots of interesting benefits from this different approach. It is not applicable to every type of system, but then nothing is.
10:30 - So you've reinvented a log-structured database?
No 🤣
If you zoom into the details there are differences. Mainly the leeway in consistency levels is much better in a system like this, as is the partitioning (sharing) and replication. This helps with resilience, scalability and performance.
What do you think of the Pony language ? It's one of the only ones that seem to embrace actors fully
This sounds awfully a lot like Tony Hoare's Communicating Sequential Processes: en.m.wikipedia.org/wiki/Communicating_sequential_processes
Which is from the 20th century BTW ;-)
It is also the main design inspiration for Go's goroutines and channels. And similar message passing primitives are used in other languages.
I guess the main difference is that, having or not primitives in the language to use the CSP or Reactive approach doesn't mean it will be used properly and design with the right mindset.
Also the primitives in Go and mean the message posting is synchronous, as it is waiting for a message. And what is worse, that there must be something listening on every channel or sender's might block.
I guess having a bus type channel in which messages live and get replicated for several consumers makes the difference.
@@joseluisvazquez5126 Yes, it is certainly related to Tony Hoare's stuff, also Erlang. The asynchrony matters a lot IMO, because that is one of the things that decouples, and so simplifies, things in distributed systems.
@@ContinuousDelivery Interesting! Yes, you need decoupling there. I wonder what does that mean for the send and receive interfaces exactly. For instance, does the send return anything? even if you do not expect a reply it might make sense to return an error when the message could not even be posted or emitted. What about the receive?
Maybe different message systems make different choices there.
@@joseluisvazquez5126 My preference is true async. You do work on reciept of a message, you send messages out with your results. These aren't so much responses, more events in the life of the system. You end up with services as little state machines, I receive an "orderBook" message, record the order and send out a "bookOrdered" message. This is a form of response, but it isn't targeted and any service interested in the book having been ordered can listen to it. There is no specific target destination. The original sender of the "orderBook" message is probably listening for "bookOrdered" and correlating the response, but my service doesn't care about that and doesn't know. There is no difference is routing between "orderBook" and "bookOrdered" they are both just messages and treated everywhere in the same way.
It's a very nice, very simple model.
@@ContinuousDelivery Yes, I get it is a message bus with many to many communications, no addressing needed. Messages get enqueued to all subscribers and they just ignore messages that do know recognize.
When you say "true async" I guess it means "bus.send(msg)" returns nothing, not even an error. If an error does happen, even just at en-queuing for sending, the message system will log it itself or try to recover from it on its own, the sender is not bothered and thus is simplified. On the other side "msg = subscription.recv()" just returns a message if there is anything on the queue, and I guess it blocks until there is anything, because otherwise the handler does not have anything to do.
Each side is single threaded, simple and fast and does not need to care about messaging issues or errors.
Ok Dave Your Store and Warehouse in the example have to deal only with business logic, messages are just coming and being sent. But just see You have done a delegation, You have silently indroduced a reliable message delivery (sub)system aka 'Kafka' nowadays. As usual today developers are putting great efforts into feeding and consuming the kafka. Even talking about 'kafka messages' what is a nonsense, since kafka is nothing else but a log, ledger. With kafka a delivery strategy coming into my mind: At-least once, At-most once and Exactly Once. Frankly a typical project manager would like 'exactly once' but that wish may break for example responsiveness. Take my word: the problem is real, restructuring the architecture means that it will be seen another part of the system: in your case at the kafka part.
I recently learned that a key "feature" of Agile is that it only supports the Project Management of the Features, Program Increments and Sprints. It scope doesn't include the Project Management efforts that decide what should be built for the customers. I'm confused about how this can be successful.
I think you are referencing to his work in general, not this specific video?
Agile is a way of working. Its cyclic nature is mainly fit to structure work that has medium to heavy uncertainty. Software development is uncertain by nature, as he points out in his 'estimates' video.
But you still should consider if this Agile way of working is the right tool for your job and if it is an organisational fit.
With that out of the way. Agile development often uses less rigorous up front specification of the 'how'. That's a main difference with 'old' project management. But the 'what' is often better assessed then he 'old' way by using a combination of best practices and evidence based approach with short validation cycles.
If you're going to replay my order every time you restart your system, exactly how many copies of your book are going to get piled on my doorstep?
Just the one, 😁
I’d like to remind you, this is exactly how relational databases work internally, and for that matter the processors that they run on top of too. This is not a new idea, and it is the most effective approach to high-performance systems used in all sorts of systems.
@@ContinuousDelivery I must be misunderstanding something. If it replays everything that ever happened since the beginning of time, then every restart would take longer and longer than previous ones.
@@michaelrstover Event Sourcing has the concept of snapshots. I.e. we occasionally inject snapshots in the sequence of events which represents the state up till that point in time. When we are replaying we only need to go back to the latest snapshot but if we want to we can ignore the snapshots and go all the way back to the beginning.
To avoid the same book being ordered multiple times we make sure that all actions are idempotent, which is hard, but doable.
Ah yes Akka. Hella coding drug that is very hard to get off of when you know it just a little bit. 😁😊😇
Yes it is a VERY nice programming model
Are these messages just strings?
They can be anything you like, but if you are building high-performance systems, then no, they will be in some binary format.
@@ContinuousDelivery Ah right! And do you post them to some kind of central queue that every other part of the system listens to, or is there some kind of other way to subscribe to different kinds of messages? What are the "design patterns" for this? Are these also described in the book?
@@WouterStudioHD Take a look at: www.enterpriseintegrationpatterns.com/patterns/messaging/toc.html
@@jimhumelsine9187 thank you!