We often talk about coupling, what exactly is coupling?
Generally, there are three types of component coupling.
- Afferent coupling: The task of the A component must depend on the implementation of B, C, and D.
2. Efferent coupling: After the task of the A component is completed, B, C, D must be executed.
3. Temporal coupling: After the task of the A component is completed, B and C must be executed. In addition, B is earlier than C.
The components mentioned here can be source code level, module level or even service level based on the granularity.
In this article we will dive into the temporal coupling in particular, because this is the most common and most overlooked pitfall. First we describe in Node.js as follows:
At this point, we found that this is really generic. Almost all of our code looks like this. It is normal to do three things in sequence in a method, isn’t it?
Let’s take a more concrete example. Suppose we have an e-commerce with a function,
purchase. Therefore, we begin to code in a simple way.
First summarize the price of all items in the cart. And then call the payment service to deal with the credit card. Simple, right?
Alright, the marketing team wants to let people who spend over 1,000 dollars get a discount coupon, so we continue to modify our
This feature is also quite common, and then the sales team found that coupon is a good promotion method, so they proposed that people who reached 5,000 dollars could get a lottery chance. This
purchase keeps growing.
This is a temporal coupling. Either
lottery actually depend on
purchasewhich must be done within the life cycle of
purchase. Once the feature requirement becomes larger and larger, the performance of the entire
purchase will be continuously dragged down. Especially, the
lottery usually requires huge calculations, and the
purchase is forced to wait for the
lottery success to be considered a success.
From the previous section, we learned that
purchase should only need to process payments, the rest of the behavior is additional, and should not be in the same life cycle as
purchase. In other words, even if the
giveCoupon fails, it should not affect
There is a method in domain-driven development called domain events. When a task is completed, it will issue an event, and the handler that cares about the event can take the corresponding action after receiving the event. By the way, this approach is also called the Observer Pattern in the design pattern. In domain-driven development, the “notification” contains the domain’s ubiquitous language, hence the notification is named domain events.
Therefore, let’s modify
purchase a little bit in the Node’s way.
With events, we can completely decouple
purchase. Even if any one of the handler fails, it does not impact the original payment flow.
purchase only needs to concentrate on the payment process. When the payment is successful, emit the event and let other functions take over.
If there are more needs in the future, there is no need to change the original
purchase, just add a new handler. And this is the concept of decoupling. Here we remove the code-level coupling and timing-level coupling.
In my previous article, we mentioned that whenever failures can happen, we have to expect them and handle them gracefully. This is called resilience engineering.
When we decouple the coupons and lottery through domain events, we will immediately face a problem. What if the event is lost? The payment is finish, but the coupon has not been issued, which is definitely a big problem for the customer.
In other words, how do we ensure that the emitted event will be executed. This is exactly why message queues were introduced into the system.
We discussed the message queue before, there are three different levels of guarantees in message delivery, which are:
- At most once
- At least once
- Exactly once
Most message queues have the at-least-once guarantee. That is to say, through the message queue we can make sure that all events can be executed at least once. This also ensures that messages are not lost.
Thus, to avoid event loss, we will change
emitter.emit to a queue submission with like RabbitMQ or Kafka. At this stage, we have introduced decoupling at the system level, ie, make event producers and consumers belong to different execution units.
The story isn’t over yet. We can already ensure that emitted events are executed. What if the event isn’t sent at all? Continue to take
purchase as an example, when
payByCreditCard has been successful, but it doesn’t send the event due to the system crashes for unexpected reasons. Then, even with a message queue, we still get the incorrect result.
In order to avoid this problem, we can leverage the event sourcing. In Distributed Transaction and CQRS, I have described the core concept of event sourcing.
Before the event is emitted, store the event into a storage first. After the handler finishes processing the event, then mark the event in the storage as “processed”.
There is one thing should be aware, the writing of events and the payment must be under the same transaction. In this way, as long as the payment is successful, the event will also be written successfully. Finally, we can periodically monitor for overdue events to know what went wrong.
This time we are still going through a step-by-step evolution of the system as we did in Shift from Monolith to CQRS to let you know how to decouple when systems become large and complex. At the beginning, we first decoupled source code and execution timing through domain events; then we introduced message queues with message producers and consumers to achieve system-level decoupling.
As I said before, a system evolves to solve a problem, but it also creates new problems. We can only choose the most acceptable solution and seek compromises in complexity, performance, productivity and other factors.
Splitting a complete action into different execution units must encounter inconsistencies. In resolving inconsistencies, there are many considerations, such as:
- Regardless of whether the event will be lost or not, just use the simplest architecture,
EventEmitter. This approach is the simplest, and there may be no problem in 80% of the cases, but what should we do if there is a problem?
- Attempting to be as reliable as possible, so introduce message queues, which should be 99% sure that there will be no problems. But there is still 1%, is such a risk bearable?
- Implementing event sourcing comes at the cost of increased complexity and performance may be affected. Is this acceptable?
Just like I always say, there is no perfect solution to system design. Each organization has a different level of risk tolerance. In various indicators, we look for the most acceptable solution for ourselves and think about the risks and failures we face at any time. As a result, everyone should be able to build a resilient system.