So… why do we need an outbox pattern for transactions?
Transactional outbox pattern is used for trusted messages and guaranteed delivery of events.
Problem
Microservices often posts events after a database transaction has been performed.
Writing to the database and posting an event are two different processes and should be atomic.
Failure to publish an event can mean a fatal failure in the business process.
This approach works fine until an error occurs between saving the entity object and posting the corresponding event. Sending an event at this point may fail for several reasons:
- network errors
- Message service interruption
- host failed
Whatever the error, the result is that a file EntityCreated
The event could not be propagated in the message bus. (Fig. 2.)
Other services will not be notified that an entity has been created.
the Entity
The service must now take care of many things that are not related to the actual business process.
It needs to keep track of the events that still need to be put into the message carrier once it’s back online.
solution
There is a well known pattern called “Transaction Outbox” which can help avoid these situations.
This pattern provides an efficient solution to reliably propagate events. The idea of this method is to have an Outbox table in the Command Service database.
When an entity creation request (command) is received, not only is an entry made in the entity table, but a record representing the event is also inserted into the Outbox table.
Both database actions are executed as part of the same transaction. (Fig. 3.)
The asynchronous background process monitors the “outgoing” table for new entries, and if there are any, it propagates the events to the event bus.
Transactional outbox pattern ensures the reliability of messages and delivery at least once.
Outgoing table
Outgoing table logs describe an event that occurred in the command service.
For example, a JSON structure represents the creation or updating of an entity, data about the entity itself, as well as contextual information, such as a use case identifier.
By explicitly sending events through the records in the outgoing mail table, it can be ensured that events are organized correctly for external consumers.
This also ensures that event consumers won’t break when, for example, the internal domain model or entity table is changed.
Log-Based Change Data Capture (CDC) is very convenient for capturing new entries in the outbound table and streaming them to Apache Kafka. Unlike any polling-based approach, event capture occurs at very low expenditures in near real time. Debezium comes with CDC connectors for many databases such as MySQL, Postgres, and SQL Server.
Chart below (Fig. 4.) It describes the general structure centered around two small services, the administration service and the user service. Since Debezium is used, both microservices must be written in Java.
Dibisium tails Transaction history (“Record pre-write”, WAL) from the admin service database in order to capture any new events in the outbound table and publish them to Apache Kafka. See Configure an outbox event router for more information about event routing.
id
– a unique identifier for each message; They can be used by consumers to detect any redundant events, for example on a restart to read messages after a failure. Generated when creating a new event.
entity_type
– file type root sum to which a particular event is associated; The idea is to draw on the same concept of domain-based design.
entity_id
– the identifier of the aggregate root affected by a specific event; This could be for example the catalog ID.
topic
– The subject of the event, for example “Catalog created” or “Catalog deleted”. Allows consumers to run appropriate event handlers.
payload
– A JSON structure with the actual event contents, eg containing catalog information.
When registering a Debezium connector – we must make sure to ignore deleting events from Kafka threads. by selecting tombstones.on.delete
to false
no ellipses (“tombstones”) will be issued by the connector when deleting an event record from the outbox table.
Processing of outgoing mail messages can be done in a separate process depending on the use cases. However, instead of creating a new process, you can also use the same process but in a separate thread, depending on the requirements. (Fig. 5.)
The idea is to use a scheduler that periodically runs and processes messages from O.utbox table. The task execution schedule can be configured to run every few seconds. The scheduler configuration really depends on how many messages you will have and need to process in your system.
Since we’re using Postgres for the persistence of the Outbox table, we’ll use Postgres’ CTE (Common Table Expressions) to fetch 100 messages sorted by their creation time and also identify and fetch messages that have been processed for more than 2 minutes. Note that we also set the status of the outgoing mail message to Processing in the same transaction.
put everything together –
As I wrote earlier, if there is an error between message processing and posting to set it as processed in the outgoing table, the task will pick up this message 2 minutes after it was last selected in the next task iterations. This way we guarantee delivery at least once.
outgoing mail message
status
–
- Pending – The outbox message is pending to be processed.
- Processing – The outgoing mail message is currently being processed. The purpose of presenting this state is to avoid running the multitasking scheduler execution in different application threads or instances to capture and process the same message. This way we reduce multiple message deliveries.
- Processed – The outgoing message has been successfully processed and is ready to be deleted.
picked_at
– The time when the message was captured by the function for processing. This value is also used to identify messages that have been picked up for processing but whose status is not set to ‘Processed’. Messages that have been picked for more than 2 minutes with a status set to Process will be picked again by job.
The rest is self-explanatory.
complication
Why is it so hard to add a simple entry to a list and return the new list as a response? Well, it all depends on the context.
Let’s start with the most common context, where the data is on one device and the users are a reasonable size, say hundreds or thousands of users. In our case CQRS was complex, we prioritized availability as users prefer to have consistency.
Take for example Facebook or Twitter, which should serve millions of users at the same time and should be optimized for scalability and availability. It doesn’t matter if you get 10 likes or comments instead of 11 for a few minutes as long as you can still use their platform and keep sharing. Or to Google to view the latest search results instantly.
In this case, CQRS isn’t that difficult because the effort to sync many stores is less than the overhead to build a single model that gets it all done. The increasing complexity of synchronizing the two models is countered by reducing scalability challenges.
Application changes
CQRS adds complexity to the system, making it more difficult to change. This only makes sense if you have some severe performance issues which are better handled with two data models rather than one. This does not apply to the majority of applications.
You have some numbers first
CQRS may be right for you. Did you try a simpler approach first? Do you have any numbers to support your claim that a typical relational database will not suffice? Have you created a proof of concept (POC) to demonstrate the feasibility of the solution?
Last but not least, when it comes to the final consistency, it is the owners of the product who decide whether or not it fits the business requirements.
Complicated inquiries
If you are using NoSQL, have you considered migrating to RDBS? Or have you considered using actual views to calculate your queries beforehand? Or maybe re-display the data?
CQRS and Event Sourcing are not an obscure combination. It is important to understand the many effects of both patterns before beginning your journey. Other than that, it is very easy to make a complete mess both technically and functionally
CQRS, Event Sources, and/or CDC can be an excellent solution to many problems assuming you have a clear understanding of their limitations and downsides.
Don’t rush into making decisions out of necessity that conflict with your original business plans. Before implementing a CQRS pattern in your system, be sure to run a POC and other tests. Creating a proof of concept to test your solution will ensure that you get the best version of it and save you time and money in the process. Don’t be afraid to get your hands dirty.
Thanks for reading!