I have completely lost the focus when he started to talk about Debezium and publishing the internal data schema of a given service toward the outside world. For me it sounds a great way to introduce coupling between services. Aka. it is a great way to introduce new problems instead of solving existing ones.
Why do microservices if you end up using Debezium? It's really trying to solve the right problem the wrong way. I was expecting something more enlightening, since databases really should not reflect and broadcast their internal logic into public streams - in most cases.
I think as he said, there's SO much on this topic, that he sort of HAD to jump around a bit. Bit of a shame, but I think he did a good job touching on many key issues. I do think it presumes a certain level of having banged your head against the issues to begin with though.
@@DavidBezemer thanks for saying the words I was about to type...it is utter drivel...utter utter drivel. "...is a newspaper like a Book?" omg. And he has authored a book. Imagine that.
I was nodding the whole time like yup, yup been there done that and just kind of hoping to get to the point really quick, how can be solved? and then bum! Have you heard about our lord and savior Debezium?
It is a great hearing, though I have one thing to point out: Actually, you don't need any specific system to change the data into a sequential stream. What you need is a .... TCP/IP protocol! Yes! TCP/IP protocol has a built-in system for recovering sequence of the data that was sent. All you need to use this power is to create one persistent connection, not multiple connections for each event. TCP/IP will automatically preserve the sequence of whatever you send through the connection, it will resend lost packets etc. You will not have to worry about a single thing. Even if you will loose a connection, all you have to do when you will regain it is to ask at which point the receiver stopped receiving and you can continue from that without worrying if he received all things before that.
Well what if one of your services fails inbetween updating data to db and emitting event? Then no TCP/IP is going to save you ;) I think that what he says makes sense. Most of the events (almost all) is created when some db write appears. Therefore it is easier to let db trigger events instead of manually calling them from code. My question would be - if some db table (where Debezium is connecting) changes its structure, is Debezium still working, or do I have to somehow update its connector?
Very good up until Debezium was mentioned. I prefer the microservices themselves to be in control of the events when it comes to bounded context events - and simply publish the event/message to a queue or topic. For simple events (not application events) when you just want to know something has happened to a storage service (i.e. adding a blob to Azure Blobs) then you can use Azure Event Grid.
Great talk! However, if you've essentially gone through technical gymnastics to recreate what a database does by using a cocaphony of API calls, consistency models, data replication, dependencies etc then you've violated the tenets of service orientation-well defined boundaries being key. If you're making a service that makes subsequent microservices calls, you're just making database calls and not doing real service orientation. Much more time needs to be spent on the domain design - that's the interesting part.
This is some of the stuff that most senior backend developers need to understand and deal with, and yet in many companies senior UI developers earn as much as senior backend developers!... like for real how hard it is to master react or angular! and how much more you need to know beside the UI frameworks!!
feels relaxed that these are universal challenges no matter what, I am not sure how log shipping through the stream to mirror data helps solves the problems? never the less.. articulation of the problem statement is good.
There is something I really do not like about this talk and I have seen many speakers do this. When you say: "This is not the full talk, because I only have 45 minutes", but then you go on talking about yourself and how you will most likely not have enough time to do this properly... you have already lost me. Like your book... but cut the clutter and get going with the content
Looks like people chasing ACID properties if a transa tional database over a distributed system and find that it is impossible and keep talking over and over.
Hmm... I tried to enjoy it.. But there's some points on there that you REALLLLLLLY laboured with pointless waffle.. I'm pretty sure everyone knows what an async message is for example @18:40. Unless this of course was aimed at high school students?
Great talk up until Debezium. When you started to talk about events being sent from the database as raw transaction log events, red lights started going off for me. Everything I have read about good Microservice architectures have also recommended that you NEVER speak with other services using your internal database schema. Otherwise, you are tightly coupling everyone to your internal schema. Making changes to that schema then becomes just as difficult as with a shared database. Instead, the general wisdom seems to be, the you should communicate external using some form of value object or aggregate so changes to a services internal schema won't need to be updated in other services. Only when you need to change that external representation (REST API response for example) do you put any requirement on other services to change. Having read many of Christian's post on his blog, I'm quite shocked to see him recommending this.... how is this not a terrible idea? I must be missing something.
Linkedin shifted progressively to microservices by using their core application's database transaction logs first as a source of events. You do with what you have, you can't always rewrite everything from scratch.
In general I think this notion of schema coupling between services if you use CDC from databases as a source of inter-service communication has become a bit dogmatised in the community. One of the things we've noticed in our journey so far to an MS architecture is how natural it is to have a pretty tight coupling between an event that you would care to emit from the application layer of a service and the representation of that same event inside the persistence layer of that service. It seems to often be the case that whenever an application change is significant enough to need a DB schema change, you'll often need to also reconsider what you emit as an event from the application layer as well. And once that happens, you are in the same situation as you would be if your other applications were consuming the CDC from the DB of that same service-- one of the benefits of the latter (at least in the case of Debezium) is that it can manage the schema evolution for you-- encourage you to look into that. We are still learning and experimenting and following the Debezium project closely, and already using it for some use cases.
@ChristianPosta, It is a great presentation and the drawings are too good. The idea seems like a dynamo db streams extension. Do we really need to segregate the data (database per microservice) and then aggregate them as logs of events like Debezium for transactional purposes?
I am working on a project which is based on microservices and stuck in a situation which I cannot resolve it. The technologies I am using are ASP.NET Core 2.0 RabbitMQ, MassTransit Saga. My task is to get the search result from my micro service, in a way that a request will be send from my website to my API gateway which will be forward to Saga project connected to RabbitMQ which will send to my search micro service which will than search in elastic search and generated an IEnumerable list of records and send it back to saga project which will then send it back to API gateway and then from there to website to display the response. I cannot find any tutorial or example on it can you please guide me or recommend me a video tutorial or any examples. In ideal situation is it a good practice to make all the microservices connectivity through a saga project even for select records ?
I see this comment was posted 2 years ago but let's post an answer anyway. The API gateway should talk directly to the search microservice and return the result. There is no need to introduce more complexity. Sagas have their use cases and are about long-running processes.
If an event generates an application exception on the subscriber side, the subscriber will not acknowledge the message. Kafka is made so it'll keep the message (and any sub sequential ones) and keep the offset at which the subscriber crashed. The application error should be fixed by a programmer and when deployed, the subscriber will start again at the last offset. Synchronization batches will add huge delays, this is not a solution for real time application.
@@diegocoding2810 Dont remember my initiat thought but... it looks strange that at the end of video SQL logs are imported as JSON instead of binary messages. Kinda huge vaste. For my noob stuff i am using some services with RPC to transfer data from source to Kafka in binary format (probably it Connect case there is nothing to do with RPC). Later will look deeper what options are available in Connect.
What about if you use event sourcing, then expose an atom like api for other services to consume? Therefore when you persist an event, your projections and other other services can update using that feed.
you mean that downstream services will call your events api to know what happened ? in a pull manner ? That's inefficient as there will be a trade-of to be made on the downstream services on the pull frequency.
I have completely lost the focus when he started to talk about Debezium and publishing the internal data schema of a given service toward the outside world. For me it sounds a great way to introduce coupling between services. Aka. it is a great way to introduce new problems instead of solving existing ones.
Why do microservices if you end up using Debezium? It's really trying to solve the right problem the wrong way. I was expecting something more enlightening, since databases really should not reflect and broadcast their internal logic into public streams - in most cases.
Did anyone else have significant trouble following the progression of this talk? Seemed to be absolutely everywhere
I think as he said, there's SO much on this topic, that he sort of HAD to jump around a bit. Bit of a shame, but I think he did a good job touching on many key issues. I do think it presumes a certain level of having banged your head against the issues to begin with though.
So much intro and too much explanations for a simple topic
Tedious drivel in an incomprehensible format
Yes
@@DavidBezemer thanks for saying the words I was about to type...it is utter drivel...utter utter drivel. "...is a newspaper like a Book?" omg. And he has authored a book. Imagine that.
I was nodding the whole time like yup, yup been there done that and just kind of hoping to get to the point really quick, how can be solved? and then bum! Have you heard about our lord and savior Debezium?
Sometimes, "Microservices is about optimizing... for CAPACITY"
It is a great hearing, though I have one thing to point out:
Actually, you don't need any specific system to change the data into a sequential stream.
What you need is a .... TCP/IP protocol!
Yes! TCP/IP protocol has a built-in system for recovering sequence of the data that was sent.
All you need to use this power is to create one persistent connection, not multiple connections for each event. TCP/IP will automatically preserve the sequence of whatever you send through the connection, it will resend lost packets etc.
You will not have to worry about a single thing.
Even if you will loose a connection, all you have to do when you will regain it is to ask at which point the receiver stopped receiving and you can continue from that without worrying if he received all things before that.
Well what if one of your services fails inbetween updating data to db and emitting event? Then no TCP/IP is going to save you ;)
I think that what he says makes sense.
Most of the events (almost all) is created when some db write appears. Therefore it is easier to let db trigger events instead of manually calling them from code.
My question would be - if some db table (where Debezium is connecting) changes its structure, is Debezium still working, or do I have to somehow update its connector?
Looks like a brand ambassador for Debezium.
Slides at : www.slideshare.net/ceposta/the-hardest-part-of-microservices-your-data
Thank you very much Antonio!! Appreciate it
Very good up until Debezium was mentioned. I prefer the microservices themselves to be in control of the events when it comes to bounded context events - and simply publish the event/message to a queue or topic. For simple events (not application events) when you just want to know something has happened to a storage service (i.e. adding a blob to Azure Blobs) then you can use Azure Event Grid.
Events are specific to the business logic and should have nothing to do with persistence.
Fantastic talk, enjoyed every bit of it. Keep up the fantastic work.
Great talk! However, if you've essentially gone through technical gymnastics to recreate what a database does by using a cocaphony of API calls, consistency models, data replication, dependencies etc then you've violated the tenets of service orientation-well defined boundaries being key. If you're making a service that makes subsequent microservices calls, you're just making database calls and not doing real service orientation. Much more time needs to be spent on the domain design - that's the interesting part.
This is some of the stuff that most senior backend developers need to understand and deal with, and yet in many companies senior UI developers earn as much as senior backend developers!... like for real how hard it is to master react or angular! and how much more you need to know beside the UI frameworks!!
feels relaxed that these are universal challenges no matter what, I am not sure how log shipping through the stream to mirror data helps solves the problems? never the less.. articulation of the problem statement is good.
Great talk. I would check out from 37:00 to 40 th minute, and the rest is all standard..
I got fed up at 18:28 and came down to check the comments. Now I leave. Thanks.
There is something I really do not like about this talk and I have seen many speakers do this. When you say: "This is not the full talk, because I only have 45 minutes", but then you go on talking about yourself and how you will most likely not have enough time to do this properly... you have already lost me. Like your book... but cut the clutter and get going with the content
Is there a video of the full talk anywhere?
Looks like people chasing ACID properties if a transa tional database over a distributed system and find that it is impossible and keep talking over and over.
Hmm... I tried to enjoy it.. But there's some points on there that you REALLLLLLLY laboured with pointless waffle.. I'm pretty sure everyone knows what an async message is for example @18:40. Unless this of course was aimed at high school students?
Great talk up until Debezium. When you started to talk about events being sent from the database as raw transaction log events, red lights started going off for me. Everything I have read about good Microservice architectures have also recommended that you NEVER speak with other services using your internal database schema. Otherwise, you are tightly coupling everyone to your internal schema. Making changes to that schema then becomes just as difficult as with a shared database. Instead, the general wisdom seems to be, the you should communicate external using some form of value object or aggregate so changes to a services internal schema won't need to be updated in other services. Only when you need to change that external representation (REST API response for example) do you put any requirement on other services to change. Having read many of Christian's post on his blog, I'm quite shocked to see him recommending this.... how is this not a terrible idea? I must be missing something.
Linkedin shifted progressively to microservices by using their core application's database transaction logs first as a source of events. You do with what you have, you can't always rewrite everything from scratch.
he did mention that kafka handles not only before and after values, but before and after schemas so your schemas can evolve.
Yeah, I didn't see any detail on the schema evolution, but he did mention at 43:45 that schema evolution is supported in "some way".
In general I think this notion of schema coupling between services if you use CDC from databases as a source of inter-service communication has become a bit dogmatised in the community. One of the things we've noticed in our journey so far to an MS architecture is how natural it is to have a pretty tight coupling between an event that you would care to emit from the application layer of a service and the representation of that same event inside the persistence layer of that service. It seems to often be the case that whenever an application change is significant enough to need a DB schema change, you'll often need to also reconsider what you emit as an event from the application layer as well. And once that happens, you are in the same situation as you would be if your other applications were consuming the CDC from the DB of that same service-- one of the benefits of the latter (at least in the case of Debezium) is that it can manage the schema evolution for you-- encourage you to look into that. We are still learning and experimenting and following the Debezium project closely, and already using it for some use cases.
@@jean-guillaumeburet1068 I almost missed the point of the whole thing, but your comment has saved it for me. Thanks.
@ChristianPosta, It is a great presentation and the drawings are too good. The idea seems like a dynamo db streams extension. Do we really need to segregate the data (database per microservice) and then aggregate them as logs of events like Debezium for transactional purposes?
I am working on a project which is based on microservices and stuck in a situation which I cannot resolve it. The technologies I am using are ASP.NET Core 2.0 RabbitMQ, MassTransit Saga. My task is to get the search result from my micro service, in a way that a request will be send from my website to my API gateway which will be forward to Saga project connected to RabbitMQ which will send to my search micro service which will than search in elastic search and generated an IEnumerable list of records and send it back to saga project which will then send it back to API gateway and then from there to website to display the response. I cannot find any tutorial or example on it can you please guide me or recommend me a video tutorial or any examples. In ideal situation is it a good practice to make all the microservices connectivity through a saga project even for select records ?
I see this comment was posted 2 years ago but let's post an answer anyway. The API gateway should talk directly to the search microservice and return the result. There is no need to introduce more complexity. Sagas have their use cases and are about long-running processes.
3:52 This guy was way ahead of his time. Check out his coughing skills, he already knew COVID was coming!
Great talk, thank you.
very helpful. Are the diagrams hand drawn?
yeah I'm pretty sure it's paper by 53
What if systems miss an event? Or process it incorrect. Old technique is to do synchronization batches. But I hope they are not needed anymore.
If an event generates an application exception on the subscriber side, the subscriber will not acknowledge the message.
Kafka is made so it'll keep the message (and any sub sequential ones) and keep the offset at which the subscriber crashed.
The application error should be fixed by a programmer and when deployed, the subscriber will start again at the last offset.
Synchronization batches will add huge delays, this is not a solution for real time application.
I didnt got this... why Kafka is using JSON instead of RPC?
This question doesn't make sense.
I think you got the concept of RPC wrong.
@@diegocoding2810 Dont remember my initiat thought but... it looks strange that at the end of video SQL logs are imported as JSON instead of binary messages. Kinda huge vaste. For my noob stuff i am using some services with RPC to transfer data from source to Kafka in binary format (probably it Connect case there is nothing to do with RPC). Later will look deeper what options are available in Connect.
46 minute ads on youtube. Not even funny anymore
What about if you use event sourcing, then expose an atom like api for other services to consume? Therefore when you persist an event, your projections and other other services can update using that feed.
you mean that downstream services will call your events api to know what happened ? in a pull manner ?
That's inefficient as there will be a trade-of to be made on the downstream services on the pull frequency.
MICROSERVICES!
good person skills me very
good