Nice video. Easy to follow through the concepts and grasp practical knowledge. One thing to add is that, the events in outbox table should have a unique identifier which is sent through as part of the payload and the downstream systems need to have some sort of de-duplication based on that ID. Example in this case, the event is sent to kafka but the update of the "processed=true" flag fails, next time the same event will be picked up and should be safe because it still have the same id, otherwise without a unique identifier we cannot de-dupe and we have just shifted the double write problem from the entity to the outbox.
Appreciate your efforts, 🙂🙏 Basant. God Bless You! I am learning lot of concepts from you! I request you based on your time if time permits ... Please do two videos in a week...
Hi @JavaTechie, You are moving the data consistency problem from order service to "message relay service" in this case. what if DB is down while publishing the message(marking the flag true) or kafka is down in the "message relay service". Either way, you are in the same problem. Please comment.
Yes, it's correct, consider this if msg is published and updation failed, msg will be processed again in other iteration, meaning a msg may be processed twice, it's ok if you have an idempotent system, but consider without transaction outbox pattern, if order failed to persist and we are publishing an event thats a bigger problem
Transactional outbox pattern solves dual write problem ,it says that when writing to 2 or more different sources(DB & Kafka) consider writing it to single source and maintain an outbox( kind of log book) so that it can be processed by other sources as well
Hi, thanks for great tutorial. When publish outbox event in kafka you also update it (again dual write ?). I think the outbox must be updated in consumer for consistency ?
I understand your point but it won't create any data inconsistency issue because let's say event publish but updating db failed then in this case data will be duplicated that's it I don't think any major impact
Thanks for the video, as this pattern required outbox table and scheduler to mitigate the distributed transactional issue, we can rely on libraries like Atomikos
@Javatechie In pollOutboxMessagesAndPublish method again you are performing two operations publishing message to kafka topic and updating outbox table record. What if one success other one failed, again you have the problem you disscussed
@@Javatechie What if published to kafka topic successful and failed to update outbox table record?. We get same outbox table record in next scheduled time then we publish same order more than once.
Thank you for the informative video! I have a question: from a use case perspective, should createNewOrder include both the creation of a new order and the publication to microservices within a single method? To adhere to the Single Responsibility Principle, it seems we should have two separate methods: createNewOrder and notifyMicroServices, with notifyMicroServices being called only if there are no exceptions in createNewOrder. Does this approach address the concern, or am I missing something?
thanks for the great video and clear examples. However, one alternate solution is, why not use CDC on table ? It will reduce the transactions on the table
Very nicely explained! One question: Isn’t the problem of dual writing still existing in the poller service now? It can fail while updating the boolean state and/or publishing to kafka?
Great tutorial, you have nailed the concept with clear view and simplicity. One thing is missing from my point of view: what should be the best approach to mark a outbox entity as processed with the ensutiry that the corresponding outbox message is processed correctly by any other dependent server?
We can use exception try and catch to overcome such issues. If something went wrong in persisting database, then it should not send message to Kafka. Please correct me If I am wrong.
@@pogo874u there could be multiple reason of db writing failure, and its not a good programming to handle all in a single Exception class. and in this example its just a small case just to explain the pattern, in real time you may get more complex scenario where you bound to use this pattern.
There are many other benefits. 1. Decoupling of order service and Kafka, now order service can receive order irrespective of kafka downtime issues. Downtime may required for upgrade or downtime can be because of kafka down. Order service can be latent if kafka publishing process lags in publishing. So any issues with Kafka service are now not impacting the Order service. Now there is a choice to publish messages. If there is no need to process the inventory at real-time, assume we have 1 day of processing time then we need to run kafka and listener service together in a scheduled manner, may be once in a day, which benefits in terms of cost, especially on cloud deployments 2. Following Single responsibility of SOLID design principles
Thanks , for the content i really appreciate that. and one comment that I have is when you run 3 part serivies like Kafka can you please use docker so that any one will not worry on the specific OS they use to run and for ease of use.
@@Javatechie but in my earlier project they created multi module project having 4 modules (common, order-service, order-lambda and cx feed generator) and deployed order lambda in AWS lambda, order service in AWS fargate and cx feed generator in AWS batch. And that project is wholesale microservices.
Can you please share some more inputs . Also as I know spring event can be use in same application it won’t work in microservices pattern (means inter communication)
@@Javatechie First of all, thanks for all your efforts. I have learnt many things from you. I will try to implement using Spring event and share you the github link. Thank you 🙏
Nice content Basant. We can achieve same thing using Spring event listener and publisher model too. Second thing, this solution is not enough on the production environment where multiple instances run simultaneously. You will come across duplicate records.
Hello Rahim , I don't understand how spring event will help you here could you please add some insights on it also regarding your second concern you can still guarantee to run scheduler once by implement Shedlock that's not a big challenges. Don't worry I will try to cover this shedlock soon
We should not allow both the instances to run simultaneously so Shedlock lock the database. I mean instance of locking the database It will lock the one single table
@@JavatechieThough you implemented Shedlock lock but you may get duplicate records in one scenario. Suppose there are 2 instances running. The first instance gets a chance to execute the job and suppose your server is busy due to heavy load and it doesn't respond within 1 min or may chances that database is busy and not respond within 1 min. After 1 min 2nd instance gets chance to execute the job. Suppose this time database is idle state so both the instances may have same records. Please handle this scenario in ur next video
If the consumer will face the same issue after message publish the db is down then we may publish the duplicate data. Can we use first the save then the kafka publish?
Small doubt.... The poller service seems to have reintroducing the dual wrtite problem since its publishing to kafka and updating the state in Outbox table. What if the the update fails and the status is 0. Won't the same event be polled twice and published multiple times to kafka?
Yes you are write but we need to handle it in efficient way in poller by applying some retry mechanism to avoid data in consistency and regarding dual write issue in poller is acceptable as it's just act as a helper to process the data but thank you for bringing this point . Will update best practice shortly
@Javatechie thanks for responding . Ok so you mean we introduce Retry and dead-letter queues to ensure consistency. But even then can't we have this as part of the order ms why introduce an extra layer between the producer and consumer (which will also result in additional latency). Currently it seems like all this pattern is doing is deferring the dual write problem to another downstream ms. Maybe I'm getting a bit ahead of myself and should wait for further videos from you to clarify this bit.
Since we have used @Transactional, if something goes wrong while saving db, how can message get published. Message will not at all published since we used @Transactional, please correct me if I am wrong @Java Techie.
Many thanks. Grateful. Very detailed. Just one doubt. Let's say after publishing to Kafka there is error while updating processed status by poller service. In that case same order will duplicated at consumer side. How we can prevent this
@@anupamkumartejaswi9210 there is a way to handle such situations. Add one more column in a table let's say isProcessed which indicates that the transaction process is successfully or not. In case, After publishing message in Kafka if any error occurred during updating the status so request will go inside catch block. In catch block update the isProcessed column status as "failed" something. Which basically states that message is sent but status in not updated in the database. In next time when job starts its execution so update that record status to "completed". In the success scenario you have to update 2 columns status isProcessed and status column too
@@rahimkhan-fh9dd problem still remains the same, let's say DB itself is down, in that case we cannot make any update right even from catch block. One way that I could think of is retrying after certain delay to update the DB.
Hi, I have a question . Inside order puller project , we are fetching data from outbox table and then, again updating processed field into that table.. but, I would like to ask if avain table updation fail. Then, kafka producer will still send the data to the topic. Is not it a case if inconsistency, since it's a dual write. Also, if we mark publish method inside kafkapublish as @transactional but, still I think will be loaded into kafka topic.
Again, for such a case, maybe the consumer needs to add some logic that it should add further operation or logic to process each order info for once only. may be by using order_id field.
I dint see you writing configurations of Kafka bootstrap server in poller service. So wondering how it got published into the queue. Can you explain how it worked without configuration?
That's the magic of auto configuration feature of spring boot , so if you won't configure explicitly it will load default value like bootstrap localhost:9092 and since I am playing with string no serialize and drserialize configuration required here
All your entity shouldn't be play with transaction right if in case yes then still you can play with single outbox table by just customizing schema by setting Entiry type and their payload in string
I guess in that case, the table and outbox table will be written but puller service's message sending will fail. And once the broker is up and running, puller service will pull the data from outbox table and send them successfully. So the only problem is, processing being postponed for a while (because of the broker is down), and there will be no data issue.
@@Javatechie First i would like to thank you for your work,Love your content, I am your subscriber since very long :) The challenge i was talking is : Lets suppose this same code is running in different pods , the order poller may fetch same records in two different pods, so some records will process twice.
"Bhaiya, can you please make a video on time-based API authentication using Keycloak? We want to restrict the functionality of the entire application so that it's only available to users from 10 AM to 5 PM."
Even with Transactional outbox pattern applied, still Peter will circle wrong answer because he is solving MCQs of Questions set B, while John has set A.😂😜 #Pun_Intended
I have not gone through the entire implementation, but i see the first explanation. Instead of doing this, why can't we have an if condition checking that the data is saved to db? Then go and publish the message If(!order.save(entity)){ RETURN EXCEPTION } Publish message to kafka This way, it is synchronus, and once data is processed, then the event will occur Note: I'm not an expert in coding
I guess the problem is, you will still put everything within a method under @Transactional. So, suppose your database operation is a success, and your broker works fine (message is sent). But somehow, there is exception thrown because of a logic issue. In that case, Spring will still rollback the whole database operations while not the operations you did on Kafka, and a data issue will be happen in this scenario. This is just my understanding, it could be wrong and please correct me if so.
@haolinzhang53 Yes, that's what i thought. If something happens after data persistence, i mean at the stage of publishing message, so under transactional, all the data operations will be rolled back and we don't need to rollback any kafka operation because our exception happened at kafka level so anyway our message won't get published.
If you are going to make these topics that easy, near future, spring boot developers count will increase rapidly. 😊😊😊
In order poller service, kafka is publishing message in to a topic and then updating outbox flag in same method. isnt it dual write scenario?
Nice video. Easy to follow through the concepts and grasp practical knowledge. One thing to add is that, the events in outbox table should have a unique identifier which is sent through as part of the payload and the downstream systems need to have some sort of de-duplication based on that ID. Example in this case, the event is sent to kafka but the update of the "processed=true" flag fails, next time the same event will be picked up and should be safe because it still have the same id, otherwise without a unique identifier we cannot de-dupe and we have just shifted the double write problem from the entity to the outbox.
Awesome man . Good catch thanks for the solution 👍 i will check this behavior once hopefully it will handle by kafka it self
I am recommending this channel to all my java developers like anything.😊
Thanks for covering so many helpful topics ❤
Really great explanation, easy to understand 🙏👌👍
Hi everyone! Welcome to Java Techie! ❤
Appreciate your efforts, 🙂🙏 Basant. God Bless You! I am learning lot of concepts from you! I request you based on your time if time permits ... Please do two videos in a week...
Thanks for the video, I would like more videos on this topics. !! Thank you. Greetings from Argentina.
Thanks buddy sure I will upload 👍
@@Javatechie Thanks . I have a question if you have a video about CDC and DDD?
@@manuonda no I don't have buddy
Please start including great concepts.in spring boot including multi threaded environments..
as usual, your videos are quite pratical and useful, which can be implemented the idea our projects
Excellent tutorial!
Hi @JavaTechie, You are moving the data consistency problem from order service to "message relay service" in this case. what if DB is down while publishing the message(marking the flag true) or kafka is down in the "message relay service". Either way, you are in the same problem. Please comment.
Yes, it's correct, consider this if msg is published and updation failed, msg will be processed again in other iteration, meaning a msg may be processed twice, it's ok if you have an idempotent system, but consider without transaction outbox pattern, if order failed to persist and we are publishing an event thats a bigger problem
Transactional outbox pattern solves dual write problem ,it says that when writing to 2 or more different sources(DB & Kafka) consider writing it to single source and maintain an outbox( kind of log book) so that it can be processed by other sources as well
How scheduler handle when we have multiple jvm instances to listen same time from db to aviod duplicates publishing?
You need to use shedlock to assure that your job will run only once
Hi, thanks for great tutorial. When publish outbox event in kafka you also update it (again dual write ?). I think the outbox must be updated in consumer for consistency ?
I understand your point but it won't create any data inconsistency issue because let's say event publish but updating db failed then in this case data will be duplicated that's it I don't think any major impact
Thanks for the video, as this pattern required outbox table and scheduler to mitigate the distributed transactional issue, we can rely on libraries like Atomikos
I was waiting this one , I hope you have covered queue outbox and in box technique as well scheduler
Thanks very helpful 👍
@Javatechie In pollOutboxMessagesAndPublish method again you are performing two operations publishing message to kafka topic and updating outbox table record. What if one success other one failed, again you have the problem you disscussed
It's not an issue right? The proces will be bit delay at least record Will be processed in next iteration
@@Javatechie What if published to kafka topic successful and failed to update outbox table record?. We get same outbox table record in next scheduled time then we publish same order more than once.
@@ramesh_panthangi can you please try sending duplicate message to.kafka and validate In your local once .
@Javatechie Not a straightforward solution to the problem you discussed. There is a lot to work around this
@@ramesh_panthangi sure I got your point let me think and upset the solution
Thank you for the informative video! I have a question: from a use case perspective, should createNewOrder include both the creation of a new order and the publication to microservices within a single method? To adhere to the Single Responsibility Principle, it seems we should have two separate methods: createNewOrder and notifyMicroServices, with notifyMicroServices being called only if there are no exceptions in createNewOrder. Does this approach address the concern, or am I missing something?
thanks for the great video and clear examples. However, one alternate solution is, why not use CDC on table ? It will reduce the transactions on the table
thank you very very much.
Please upload more video like this and for microservices design pattern
Debezium helps to avoid duel write problem
thank you
Very nicely explained! One question:
Isn’t the problem of dual writing still existing in the poller service now? It can fail while updating the boolean state and/or publishing to kafka?
Great tutorial, you have nailed the concept with clear view and simplicity. One thing is missing from my point of view: what should be the best approach to mark a outbox entity as processed with the ensutiry that the corresponding outbox message is processed correctly by any other dependent server?
Can I go transaction out box pattern over saga pattern? Which one is recommended
What is the need of outbox table? we can directly pull data from order table. Please correct me if anything wrong
We can use exception try and catch to overcome such issues. If something went wrong in persisting database, then it should not send message to Kafka. Please correct me If I am wrong.
Isn’t it manual effort?
@@Javatechie when you say manual effort, can you plz elaborate how it's more work?
@@pogo874u there could be multiple reason of db writing failure, and its not a good programming to handle all in a single Exception class. and in this example its just a small case just to explain the pattern, in real time you may get more complex scenario where you bound to use this pattern.
wondrful video covered lot of things .please let me know is there any plans for devops for developers course?
There are many other benefits.
1. Decoupling of order service and Kafka, now order service can receive order irrespective of kafka downtime issues. Downtime may required for upgrade or downtime can be because of kafka down. Order service can be latent if kafka publishing process lags in publishing. So any issues with Kafka service are now not impacting the Order service. Now there is a choice to publish messages. If there is no need to process the inventory at real-time, assume we have 1 day of processing time then we need to run kafka and listener service together in a scheduled manner, may be once in a day, which benefits in terms of cost, especially on cloud deployments
2. Following Single responsibility of SOLID design principles
Absolutely agree and thank you for summarising this benefits .
Good information, By the way which tool do you use to draw architecture flows?
It's simple microsoft power point
Thanks , for the content i really appreciate that. and one comment that I have is when you run 3 part serivies like Kafka can you please use docker so that any one will not worry on the specific OS they use to run and for ease of use.
Thank you that's a good suggestion 👍. Definitely will follow this
can we use a multi-module project to define services in two separate modules (order service, order poller, common module (if required))
Yes but that seems monolithic approach isn't it
@@Javatechie but in my earlier project they created multi module project having 4 modules (common, order-service, order-lambda and cx feed generator) and deployed order lambda in AWS lambda, order service in AWS fargate and cx feed generator in AWS batch. And that project is wholesale microservices.
Sorry I misunderstood yes you are correct we can use multi module project
This can be easily handled using Spring event publisher and listener model...
Can you please share some more inputs . Also as I know spring event can be use in same application it won’t work in microservices pattern (means inter communication)
@@Javatechie First of all, thanks for all your efforts. I have learnt many things from you. I will try to implement using Spring event and share you the github link. Thank you 🙏
Nice content Basant.
We can achieve same thing using Spring event listener and publisher model too.
Second thing, this solution is not enough on the production environment where multiple instances run simultaneously.
You will come across duplicate records.
you can achive it by using Shedlock.
Hello Rahim , I don't understand how spring event will help you here could you please add some insights on it also regarding your second concern you can still guarantee to run scheduler once by implement Shedlock that's not a big challenges. Don't worry I will try to cover this shedlock soon
Yes, Last month I worked on a similar issue where I implemented Shedlock
We should not allow both the instances to run simultaneously so Shedlock lock the database. I mean instance of locking the database It will lock the one single table
@@JavatechieThough you implemented Shedlock lock but you may get duplicate records in one scenario.
Suppose there are 2 instances running. The first instance gets a chance to execute the job and suppose your server is busy due to heavy load and it doesn't respond within 1 min or may chances that database is busy and not respond within 1 min.
After 1 min 2nd instance gets chance to execute the job. Suppose this time database is idle state so both the instances may have same records.
Please handle this scenario in ur next video
If the consumer will face the same issue after message publish the db is down then we may publish the duplicate data. Can we use first the save then the kafka publish?
Small doubt.... The poller service seems to have reintroducing the dual wrtite problem since its publishing to kafka and updating the state in Outbox table. What if the the update fails and the status is 0. Won't the same event be polled twice and published multiple times to kafka?
Yes you are write but we need to handle it in efficient way in poller by applying some retry mechanism to avoid data in consistency and regarding dual write issue in poller is acceptable as it's just act as a helper to process the data but thank you for bringing this point . Will update best practice shortly
@Javatechie thanks for responding . Ok so you mean we introduce Retry and dead-letter queues to ensure consistency. But even then can't we have this as part of the order ms why introduce an extra layer between the producer and consumer (which will also result in additional latency). Currently it seems like all this pattern is doing is deferring the dual write problem to another downstream ms. Maybe I'm getting a bit ahead of myself and should wait for further videos from you to clarify this bit.
Hello, I'd like to request you to explain the deep Java memory model. Thanks
Noted ✅️
Can we implement this solution to handle failure in inventory or payment service?
hahaha awesome example I love peter :)
your intellij looks different the icons for repo, service etc, what's the reason?
I have added one plug-in for this . Will check and update you
Since we have used @Transactional, if something goes wrong while saving db, how can message get published. Message will not at all published since we used @Transactional, please correct me if I am wrong @Java Techie.
Message Wil publish. This @Transaction will work for DB not for messaging channel buddy . You can give a try
Super boss,
@Javatechie, could you please help to make video series explaining integrating FIX API/FIX protocol with Java spring boot application please.
I haven't tried this buddy sure will check and update
@@Javatechie Thank u so much brother
Your screenshot and thumbnail misspelled the word transactional is missing the letter 'a' at the end. Was this intentional?
If we are unable to write to outbox table due to some issue, how to handle the order.
Many thanks. Grateful. Very detailed. Just one doubt. Let's say after publishing to Kafka there is error while updating processed status by poller service. In that case same order will duplicated at consumer side. How we can prevent this
@@anupamkumartejaswi9210 there is a way to handle such situations. Add one more column in a table let's say isProcessed which indicates that the transaction process is successfully or not.
In case, After publishing message in Kafka if any error occurred during updating the status so request will go inside catch block.
In catch block update the isProcessed column status as "failed" something. Which basically states that message is sent but status in not updated in the database.
In next time when job starts its execution so update that record status to "completed".
In the success scenario you have to update 2 columns status
isProcessed and status column too
@@rahimkhan-fh9dd problem still remains the same, let's say DB itself is down, in that case we cannot make any update right even from catch block. One way that I could think of is retrying after certain delay to update the DB.
@@anupamkumartejaswi9210 if database itself is down so how application will fetch unprocessed data from database. No fetch data so no send message.
Hi, I have a question . Inside order puller project , we are fetching data from outbox table and then, again updating processed field into that table.. but, I would like to ask if avain table updation fail. Then, kafka producer will still send the data to the topic. Is not it a case if inconsistency, since it's a dual write.
Also, if we mark publish method inside kafkapublish as @transactional but, still I think will be loaded into kafka topic.
If the transaction rollback is there, while updating the data into the outbox table. Then , there will be a repetitive publish into Kafka topic.
Again, for such a case, maybe the consumer needs to add some logic that it should add further operation or logic to process each order info for once only. may be by using order_id field.
I dint see you writing configurations of Kafka bootstrap server in poller service. So wondering how it got published into the queue.
Can you explain how it worked without configuration?
That's the magic of auto configuration feature of spring boot , so if you won't configure explicitly it will load default value like bootstrap localhost:9092 and since I am playing with string no serialize and drserialize configuration required here
@@Javatechie great and one small question, can we use Mapstruct instead of using mapper methods where you are converting dto to entity?
@@ravitejpotti yes absolutely correct we should use
In this pattern if I have to take care of 100 different entity then need 100 more outbox table
All your entity shouldn't be play with transaction right if in case yes then still you can play with single outbox table by just customizing schema by setting Entiry type and their payload in string
@@Javatechie I have same applications in my company we have more than 1000 tables data and single outbox table
@@TravellWithAkhil yes that's great, same I mentioned in above 😀
So what happened if broker is down.
I guess in that case, the table and outbox table will be written but puller service's message sending will fail.
And once the broker is up and running, puller service will pull the data from outbox table and send them successfully.
So the only problem is, processing being postponed for a while (because of the broker is down), and there will be no data issue.
This will work only if there is one server, in multi server you will get issues
What issue you will get ? Could you please add some inputs
@@Javatechie First i would like to thank you for your work,Love your content, I am your subscriber since very long :)
The challenge i was talking is :
Lets suppose this same code is running in different pods , the order poller may fetch same records in two different pods, so some records will process twice.
Yes you are correct but we can use shedlock to avoid duplicate run of our scheduler in different instance
@@Javatechie Will go through the shedlock, Thank you Basanth for making RUclips content.
"Bhaiya, can you please make a video on time-based API authentication using Keycloak? We want to restrict the functionality of the entire application so that it's only available to users from 10 AM to 5 PM."
Again send and save is happening together
but here we are sharing one DB to multiple microservices which is not the way for a microservice arch
❤❤❤
I acted like john and got result like peter
Even with Transactional outbox pattern applied, still Peter will circle wrong answer because he is solving MCQs of Questions set B, while John has set A.😂😜 #Pun_Intended
😝😝😝 context 🤣🤣
I have not gone through the entire implementation, but i see the first explanation. Instead of doing this, why can't we have an if condition checking that the data is saved to db? Then go and publish the message
If(!order.save(entity)){
RETURN EXCEPTION
}
Publish message to kafka
This way, it is synchronus, and once data is processed, then the event will occur
Note: I'm not an expert in coding
I guess the problem is, you will still put everything within a method under @Transactional. So, suppose your database operation is a success, and your broker works fine (message is sent). But somehow, there is exception thrown because of a logic issue. In that case, Spring will still rollback the whole database operations while not the operations you did on Kafka, and a data issue will be happen in this scenario.
This is just my understanding, it could be wrong and please correct me if so.
@@haolinzhang53 You are right 👍
@haolinzhang53 Yes, that's what i thought. If something happens after data persistence, i mean at the stage of publishing message, so under transactional, all the data operations will be rolled back and we don't need to rollback any kafka operation because our exception happened at kafka level so anyway our message won't get published.
What if Kafka published first then your db operation failed ?
@@Javatechie in that case we have to manually rollback which is headache. But i was just thinking in a way where db is first then kafka.
🎉
Some must be thinking. This guy should not come to Ameerpet.
🤣
encouraging exam malpractice ? :)
This pattern helps ensure data consistency between microservices, not about promoting any unethical behavior. 😊
🎉🎉😂😂