That's not entirely correct. Git stores complete files, reusing files and directories that have not changed. In all cases, git at its core doesn't store diffs.
That is completely wrong analogy. There is nothing close to event sourcing in git, unless you are considering "commit" being the events, which is nonsense. Commit is a new state of the system.
Great video! I think, within one system, async projections are usually not needed. It should be an exception to project your query models async as they are too painful to work with. What i usually do is within one transaction: - Fetch events and rebuild my aggregate - Apply the command: Raise events and messages that will be sent to other processes. - Save the events, messages, query projection all at once. It's still performant for small-medium aggregates. Async projections would be my last resort for solving performance issues. Versioning the aggregate and caching it is probably the best value for money :) I find it rather unfortunate to associate event sourcing with eventually consistent CQRS.
you do overengineering there! you only need cqrs and a record of commands executed. that has nearly nothing to do with events. which are an artifact of the relativistic nature of distributed systems. you dont have that, why are you doing it then? bad coder!
>- Apply the command: Raise events and messages that will be sent to other processes. >- Save the events, messages, query projection all at once. Sounds like a good caveat, though the "transaction" (or your database connection) is probably leaking through your different layers of abstraction (domain, repository, etc.)
Great video Tom! One thing I think should have been different is the "event" column shown at 2:30. Storing the data this way will make querying really hard, you need to split the event type and the data associated with it into at least two separate columns. I suppose you know this but having it this other way in the video may confuse people so I fel the need to point that out.
Unfortunately I haven’t found a good practical guide for event sourcing. But the events table usually has: position = incrementing sequence stream_id = the unique id (user id in this case) stream_type = used to store multiple types of streams in the same ever store version = like position but for this specific stream_id event_type = (optional) but the type of event metadata = metadata for how the event was saved payload = json payload of event And I might start adding a timestamp column as well. With an index on position And one on stream_id I hope this is enough to go off of. If anyone has any good resources giving a practical guide for this I’d love to see it.
The problem with the events table is that it's not atomic, because the event column contains more than one information and highly dependent of the type of event you are dealing with. To solve this problem, you would have the necessity to maintain a different table for each type of event with the relevant columns, i.e. phone number whenever having an event that deals with the phone number. It's annoying to maintain, though.
Since you don’t query the event store directly you can just save the events in an encoded format like JSON. It would be very frustrating to do this the idiomatic SQL way.
@@nero3700 The issue remains, event_data is not atomic, but Tom is right, the actual middle ground is to encode data. When I think about that I always have to think about Wordpress and how it stores plugin data that can really be anything. It's the best and worst example that comes into my mind. Atomic or not, in production nobody cares too much, if the output delivers.
Git isn't event sourced. Subversion was, and it working out badly is one of the reasons Git was created. Git stores full copies of the state at each commit, and allows you to find the diff between two versions quickly since it only needs to diff two full states, instead of walking along a path and joining diffs. In this example, that would've been closer to copying the table on each event and then allowing you to recreate the event stream by diffing each table version and working backwards from that diff, except since git use immutable structures with Copy on Write semantics, the things that are unchanged are reused between versions. I dont think there's a good analogy in RDBMSs, but I'm all ears on that one.
Git's approach to version control, using snapshots and reusing unchanged data, contrasts with Subversion's linear tracking of changes. This misconception about Git storing full copies at each commit and being event-sourced like Subversion isn't accurate. Git optimizes storage, making it often more efficient and compact compared to Subversion. The development of Git addressed limitations in systems like Subversion, focusing on performance, distributed collaboration, and better support for branching and merging, rather than departing from an "event-sourced" model. The fundamental difference between SVN and Git is in their approach to version control. SVN is a centralized version control system, with linear history. Git is a distributed version control system, where every clone of the repository contains the full history of changes, and its model supports branching and merging more naturally and efficiently. That was also the main reason to write git. Git has been also more compared to a graph.
@@RogerValorNot sure what your reply should describe or add to the original comment. The original comment was on point. Also git commits do contain the full state (you can say "full copy"), that is not a misconception. It is a technical detail that it reuses object references when there is no change as the object identifiers remain the same as they are based on the content hash.
Yeah, I wouldn't say that compressed deltas, packs, and other optimizations like that really changes how the model works. It's just an optimization, that works kind of because it operates outside of the model, almost transparently.
Title is incorrect. Everything you mentioned in the video still use CRUD. > CRUD and CQRS are completely different things that aren't mutually exclusive. CQRS only means that your write model is separated from your read model. Therefore, both can scale independently of each other. While CQRS is often associated with DDD & EventSourcing these aren't prerequisites of CQRS - copied from some stackoverflow comment.
every now and then, an event source must contain the current state, not just a diff to the previous state. otherwise you would have to process the entirety of the history to build the current state, something unfeasible in an enterprise environment. if after every 5-10 events / processes, one event is generated, which acts as an anchor point, you would not have to go back further than that, you could use that as the starting point to construct the current state. is this the way its done in this paradigm?
I don’t see why you couldn’t do this in this paradigm. But it’s not always necessary. When choosing to event source something we usually make sure that the lifecycle of the domain has some sort of logical end. A loan is a good example of this because it is guaranteed to end after some number of countable repayments. Paired with database indexing this guarantees that queries don’t take too long.
I think that's what he shows in the video when talking about Eventual Consistency. There can still be a table containing the current state, and the event table is separate. I do _something_ like this but I just keep the current state in a JSON blob lol.
This video was pretty cool, but the title made me start watching with a very skeptical mindset I wish you would just name what it is about, event sourcing
Yeah this is a tough one. In this case you usually save a hashed version of the PII. If you need to save the actual value you can store the actual value in a seperate table that can be removed when needed.
@@tom-delalande Good point. A system like this could be set up with pointers to a seperate table that stores PII or other data that needs to be updated/deleted.
Very informative video , enjoyed the video very small personal feedback Voice clarity and tone are good, just try to speed up your speech a little bit , 1.25 / 1.5 felt clear due to the slowness of the speech in the first place
guys chill i only asked for a bit of speed as i said before this is personal thing and it's up for him to decide either way I'll keep checking if he drops any other future videos
I like the points you make, but the example you give is (imho) horrible. when using an append-only event-list containing user data you're fundamentally breaking privacy laws. What if I remove my phone number from the service? Yeah sure, it won't be used after a deletion-event was added but registering it, verifying it, using it and deleting it will be stored forever. I guess you could add a pruning algorithm to fix this… 🤔
The problem at hand is about how to model your data accurately. It's not about event sourcing or any other gimmicks. Data is front and center. You could use state machines in your application code to make invalid state unrepresentable, designing with a good data modelling of your domain.
every time i've heard this argument: - i've seen it coming from people that do not have a full grasp of how the pattern works (and therefore "boo event sourcing bad!") and haven't taken the time to get familiar with it, - there is poor tooling and little-to-no processes established on the maintenance of the system, - the implementation is not fully "event sourced", or doesn't follow all the good practices that have been discovered to make it work well (connects to point above), - last but definitely not least, it included poor domain discovery and modeling, which I think is paramount, specifically with such a pattern. software engineering is a diverse craft that tries to tackle relatively-high complex domains. there is no "good" or "bad" in absolutes, everything depends on the context. and, as a software engineer, it's on you to have an ample and diverse toolbox plus a good level of pragmatism that allows you to apply the most *effective* solution to the problem at hand. event sourcing is not bad in and of itself. as everything, you gotta understand where to use it, and how to use it effectively.
Love the git analogy, probably the largest event sourcing system currently deployed and people don't even realise it's using the pattern
You’re the one who first told me about this! I learn from the best
Then how the final state stored? In git, files outside the .git folder shows the current head state right? What about the database?
That's not entirely correct. Git stores complete files, reusing files and directories that have not changed. In all cases, git at its core doesn't store diffs.
@@cid-chan-2 that makes sense
That is completely wrong analogy. There is nothing close to event sourcing in git, unless you are considering "commit" being the events, which is nonsense. Commit is a new state of the system.
Being an Akira fan is a real privilege 🙏
Greatly explained the topic.
Great video!
I think, within one system, async projections are usually not needed.
It should be an exception to project your query models async as they are too painful to work with.
What i usually do is within one transaction:
- Fetch events and rebuild my aggregate
- Apply the command: Raise events and messages that will be sent to other processes.
- Save the events, messages, query projection all at once.
It's still performant for small-medium aggregates.
Async projections would be my last resort for solving performance issues.
Versioning the aggregate and caching it is probably the best value for money :)
I find it rather unfortunate to associate event sourcing with eventually consistent CQRS.
you do overengineering there! you only need cqrs and a record of commands executed. that has nearly nothing to do with events. which are an artifact of the relativistic nature of distributed systems. you dont have that, why are you doing it then? bad coder!
>- Apply the command: Raise events and messages that will be sent to other processes.
>- Save the events, messages, query projection all at once.
Sounds like a good caveat, though the "transaction" (or your database connection) is probably leaking through your different layers of abstraction (domain, repository, etc.)
you have a point, you could do event sourcing without CQRS.
Beautifully clear, concise and informative video. Also that last sentence! 😀 Great work
Great video Tom! One thing I think should have been different is the "event" column shown at 2:30. Storing the data this way will make querying really hard, you need to split the event type and the data associated with it into at least two separate columns.
I suppose you know this but having it this other way in the video may confuse people so I fel the need to point that out.
Agreed, the example events table is missing this among some other key features that make this more scaleable.
@@tom-delalande Other key features such as ? I'm not familiar with this system architecture and would like to know more !
Unfortunately I haven’t found a good practical guide for event sourcing. But the events table usually has:
position = incrementing sequence
stream_id = the unique id (user id in this case)
stream_type = used to store multiple types of streams in the same ever store
version = like position but for this specific stream_id
event_type = (optional) but the type of event
metadata = metadata for how the event was saved
payload = json payload of event
And I might start adding a timestamp column as well.
With an index on position
And one on stream_id
I hope this is enough to go off of. If anyone has any good resources giving a practical guide for this I’d love to see it.
The problem with the events table is that it's not atomic, because the event column contains more than one information and highly dependent of the type of event you are dealing with. To solve this problem, you would have the necessity to maintain a different table for each type of event with the relevant columns, i.e. phone number whenever having an event that deals with the phone number. It's annoying to maintain, though.
Since you don’t query the event store directly you can just save the events in an encoded format like JSON. It would be very frustrating to do this the idiomatic SQL way.
@@tom-delalande exactly; we have used Protobuf and ProtoJSON in production for the event payload format. worked like a charm
How about a middle ground where you split the event column into event_name, event_data and event_result?
@@nero3700 The issue remains, event_data is not atomic, but Tom is right, the actual middle ground is to encode data. When I think about that I always have to think about Wordpress and how it stores plugin data that can really be anything. It's the best and worst example that comes into my mind. Atomic or not, in production nobody cares too much, if the output delivers.
Very well explained!
Git isn't event sourced. Subversion was, and it working out badly is one of the reasons Git was created.
Git stores full copies of the state at each commit, and allows you to find the diff between two versions quickly since it only needs to diff two full states, instead of walking along a path and joining diffs.
In this example, that would've been closer to copying the table on each event and then allowing you to recreate the event stream by diffing each table version and working backwards from that diff, except since git use immutable structures with Copy on Write semantics, the things that are unchanged are reused between versions. I dont think there's a good analogy in RDBMSs, but I'm all ears on that one.
Git's approach to version control, using snapshots and reusing unchanged data, contrasts with Subversion's linear tracking of changes. This misconception about Git storing full copies at each commit and being event-sourced like Subversion isn't accurate. Git optimizes storage, making it often more efficient and compact compared to Subversion. The development of Git addressed limitations in systems like Subversion, focusing on performance, distributed collaboration, and better support for branching and merging, rather than departing from an "event-sourced" model.
The fundamental difference between SVN and Git is in their approach to version control. SVN is a centralized version control system, with linear history. Git is a distributed version control system, where every clone of the repository contains the full history of changes, and its model supports branching and merging more naturally and efficiently. That was also the main reason to write git.
Git has been also more compared to a graph.
@@RogerValorNot sure what your reply should describe or add to the original comment. The original comment was on point. Also git commits do contain the full state (you can say "full copy"), that is not a misconception. It is a technical detail that it reuses object references when there is no change as the object identifiers remain the same as they are based on the content hash.
Yeah, I wouldn't say that compressed deltas, packs, and other optimizations like that really changes how the model works. It's just an optimization, that works kind of because it operates outside of the model, almost transparently.
Title is incorrect. Everything you mentioned in the video still use CRUD.
> CRUD and CQRS are completely different things that aren't mutually exclusive. CQRS only means that your write model is separated from your read model. Therefore, both can scale independently of each other. While CQRS is often associated with DDD & EventSourcing these aren't prerequisites of CQRS - copied from some stackoverflow comment.
every now and then, an event source must contain the current state, not just a diff to the previous state.
otherwise you would have to process the entirety of the history to build the current state, something unfeasible in an enterprise environment.
if after every 5-10 events / processes, one event is generated, which acts as an anchor point, you would not have to go back further than that, you could use that as the starting point to construct the current state.
is this the way its done in this paradigm?
I don’t see why you couldn’t do this in this paradigm. But it’s not always necessary. When choosing to event source something we usually make sure that the lifecycle of the domain has some sort of logical end. A loan is a good example of this because it is guaranteed to end after some number of countable repayments.
Paired with database indexing this guarantees that queries don’t take too long.
I think that's what he shows in the video when talking about Eventual Consistency. There can still be a table containing the current state, and the event table is separate. I do _something_ like this but I just keep the current state in a JSON blob lol.
This video was pretty cool, but the title made me start watching with a very skeptical mindset I wish you would just name what it is about, event sourcing
Very good video.
How do you handle GDPR removal of, say, phone number?
Yeah this is a tough one. In this case you usually save a hashed version of the PII. If you need to save the actual value you can store the actual value in a seperate table that can be removed when needed.
@@tom-delalande Good point. A system like this could be set up with pointers to a seperate table that stores PII or other data that needs to be updated/deleted.
How did you edit this video?
It is my first attempt as using Motion Canvas. An open source tool created by @aarthificial. I definitely recommend it.
@@tom-delalande It was a good first attempt, I liked it. Thanks Tom!
@@tom-delalande I expected it to be Manim. Never heard about Motion Canvas, I'll have to check it out.
This is not an alternative to CRUD but is complementary. It's a good idea for specific cases in which you are interested in saving an "history".
Does this mean event source just translates "Log every action"
Yes it can helpful to think of it this way, but sometimes one action can create multiple events.
Very informative video , enjoyed the video
very small personal feedback Voice clarity and tone are good,
just try to speed up your speech a little bit , 1.25 / 1.5 felt clear
due to the slowness of the speech in the first place
Thanks! Appreciate it
I do not agree on this one. No need to rush. But if you do like the rush, crank up the speed on the video like you did :)
current speed is okay, im non native english speaker
guys chill i only asked for a bit of speed as i said before this is personal thing and it's up for him to decide
either way I'll keep checking if he drops any other future videos
I like the points you make, but the example you give is (imho) horrible. when using an append-only event-list containing user data you're fundamentally breaking privacy laws. What if I remove my phone number from the service? Yeah sure, it won't be used after a deletion-event was added but registering it, verifying it, using it and deleting it will be stored forever. I guess you could add a pruning algorithm to fix this… 🤔
The problem at hand is about how to model your data accurately. It's not about event sourcing or any other gimmicks. Data is front and center. You could use state machines in your application code to make invalid state unrepresentable, designing with a good data modelling of your domain.
Not this again! Event sourcing is the WORST legacy part of our platform, it sounds good in theory but it's awful in practice
every time i've heard this argument:
- i've seen it coming from people that do not have a full grasp of how the pattern works (and therefore "boo event sourcing bad!") and haven't taken the time to get familiar with it,
- there is poor tooling and little-to-no processes established on the maintenance of the system,
- the implementation is not fully "event sourced", or doesn't follow all the good practices that have been discovered to make it work well (connects to point above),
- last but definitely not least, it included poor domain discovery and modeling, which I think is paramount, specifically with such a pattern.
software engineering is a diverse craft that tries to tackle relatively-high complex domains. there is no "good" or "bad" in absolutes, everything depends on the context.
and, as a software engineer, it's on you to have an ample and diverse toolbox plus a good level of pragmatism that allows you to apply the most *effective* solution to the problem at hand.
event sourcing is not bad in and of itself.
as everything, you gotta understand where to use it, and how to use it effectively.
Imagine calling a pattern legacy 😂😂😂