DDD Series: Bounded Context Integration II - Technical Integration

Este post también está disponible en español.

Posts in this series:

Introduction
Bounded Contexts
Bounded Context Integration I - Context Map
Bounded Context Integration II - Technical Integration
Architecture
Tactical patterns introduction
Value objects
Entities
Domain services
Modules
Factories
Aggregates
Repositories
Domain Events
Event Sourcing
Sagas and Process Managers
CQRS
Policies

Bounded Contexts are autonomous

When integrating Bounded Contexts it is important to remember that they must be autonomous. Each one is developed independently and isolated from the others. Your codebase can be evolved without fear of breaking functionality of another Bounded Context. There are no source code dependencies between Bounded Contexts. We create these contexts to reduce the complexity of our system and to be able to deliver business value quickly, so it is essential to try to respect their autonomy.

The development team of a Bounded Context must not have dependencies on other teams, it must be able to deliver value (functionalities or changes) without needing other teams to make or approve any changes.

If there is loose coupling then we will have less bottlenecks and business value is going to be delivered faster.

Next, we will see different technical integration strategies between Bounded Contexts and analyze the problems and benefits that each brings.

Code level integration

Having all the Bounded Contexts in the same code repository or in the same solution within an IDE (for example, in the same project or solution from an IDE such as IntelliJ or Visual Studio) can help developers to see the big picture.

Sometimes this is not possible because the Bounded Contexts are written in different languages or for other operating systems.

Bounded Contexts, in this scheme, can simply import code from another context to integrate with. In these cases, Bounded Contexts are usually deployed as modules of an application that integrates everything, a monolith.

This approach, which seems to be very simple at first, carries the significant risk of generating a lot of dependencies (and coupling) between the Bounded Contexts. If a Bounded Context uses code from another project in the same solution, there is a good chance that it could break functionality from another or contaminate the model we want to protect.

By having many Bounded Contexts together in the same solution, you begin to see opportunities for code reuse, configuration, and tooling. While at first it seems like a good idea, over time this leads to cross-team dependencies and bottlenecks, we are at great risk of generating a Big Ball of Mud.

As the codebase grows, many developers will work on different features in parallel. Trying to merge the code and produce a release will be more difficult and painful. Deploys will be riskier and more QA is going to be needed. This is a huge waste for the company.

As we said before, in our profession there are never rules but heuristics instead. Each approach has its pros and cons and it is up to us to choose the best trade-offs for each context. We will discuss in another series of posts about the Modular Monoliths where integration is done by code but maintaining low coupling. This allows us to later evolve to other integration mechanisms when is needed according to the business growth.

Database-level integration

Database level integration

When Bounded Contexts are integrated by code in the same application, a popular approach is to share the same database.

In this strategy, the Bounded Contexts have their tables in the same DB and share them, reading and writing what each one needs to be able to interact.

This type of integration is very problematic. The DB is a dependency between teams and contexts that slows development processes and limits autonomy.

For example, the team developing the sales Bounded Context wants to change the users’ table but no one knows if that will affect the code for the delivery or billing contexts. Making these kinds of changes will require coordinating with multiple teams and distracting them from their priorities. This will affect the rate of delivery of the teams.

Sometimes when two models are integrated sharing the same database there similar but different concepts. Tables start out simple but over time new columns appear that are useful only for a certain Bounded Context and not for others. In some cases, even the same column can be interpreted differently or mean something different depending on the context.

As the nature of each Bounded Context is different, what started simple will become more complex over time and the problems will grow exponentially because each context pushes in the opposite direction.

It is necessary to establish physical barriers to protect the models and the autonomy of the teams.

Using physical barriers to integrate Bounded Contexts

To ensure the integrity of the models and the autonomy of the teams, the most widely used approach is an architecture where nothing is shared. Each Bounded Context has its own codebase, its own database, and its own development team.

When this happens we are in the presence of a distributed system.

Distributed system

In this approach each Bounded Context is physically isolated, methods cannot be called directly from one another and data cannot be accessed. In this architecture there is a friction to generate coupling, you cannot accidentally introduce it, it requires a special effort.

Communication between Bounded Contexts happens through an explicit and well-defined contract.

Next we will see different integration strategies for Bounded Contexts in distributed systems.

Integration via database in a distributed system

A distributed variant of the integration through the database that we saw previously consists of using a DB as a means of exchanging information (each Bounded Context has its own DB).

In this strategy, a Bounded Context usually allows another one to access some exchange table in its DB. For example, a sales Bounded Context generates a new row in an order table every time a new sale is made. The billing Bounded Context checks periodically this table for new records that requires billing. In this case, the id of the last invoiced order could be saved in its own DB or, in some cases, a column is added to this table so that the Bounded Context marks the element as processed. The DB access is usually restricted to shared tables to avoid coupling.

Integration via database in a distributed system

In this strategy the contexts have little coupling. Of course the order table acts as a contract, therefore it will be difficult for the sales team to modify it and they will have to coordinate with the other teams.

A problem with this strategy is that lock problems can be produced in the DB (both systems compete for the same resources, if many sales are generated, it affects the billing capacity). In addition, this DB becomes a single point of failure (if the sales DB fails, invoices cannot be made).

Integration via database in a distributed system

Integration via files in a distributed system

In this type of integration (which is very common in traditional banking systems) one context writes files to a server and another context reads them.

Integration via files in a distributed system

This strategy is more flexible than DB integration and has no lock problems. However there are other problems such as dependency on the file format. A standard must be maintained regarding the format and structure of the files and that standard operates as a contract, changing it requires coordination and development in both teams.

Database engines solve many problems that in this case we must attend manually, which generates additional development effort. We have to deal with error management, scalability, concurrency, robustness, etc.

Integration via RPC in a distributed system

The idea behind RPC (Remote Procedure Call) is to keep the monolithic design but with the benefits of a distributed system.

A context calls a method of a class to invoke some functionality of another context. Internally, this method makes the call through the network and not directly invoking the source code (for example through an HTTP request).

This integration is simpler than file integration because there are many libraries and frameworks that solve common RPC problems.

You can start with a monolithic system and, when it’s time to scale, you replace direct calls with RPC calls over the network. The logic that you want to distribute passes from the monolith to another subsystem and in this way the load is distributed between two servers.

The code with RPC is very similar to what is used in a monolithic system:

val order = salesBoundedContext.createOrder(orderRequest)
val paymentStatus = billingBoundedContext.processPaymentFor(order)
if (paymentStatus.isSuccessful) {
	shippingBoundedContext.arrangeShippingFor(order)
}

Each method can be processed in memory or make an HTTP call to another service. The goal of RPC is to make network communication transparent.

Integration via RPC in a distributed system

As we can see in the previous example, all the processing of the http place order request (buying a product, for example) is done synchronously using remote RPC calls to different Bounded Contexts (which in turn use third-party services). The thread serving the HTTP request is blocked until the entire flow is processed and the response is returned.

Many novices choose RPC because of how simple it seems to be and for the possibilities of code reuse it brings, however RPC has several problems:

It is very difficult to make the system resilient to failures: Since network communication is transparent, you forget about it. Network errors happen more often than one might think, which makes systems using RPC less reliable. In the above example if any of all of the remote calls fail due to a network problem, the entire flow fails.
It is difficult and expensive to scale the system: If, for example, we have a web application that is slow due to the large number of users using it, we could think about improving the web server by adding more hardware (CPU and memory). However many user requests will use the sales and billing contexts that are on other servers. Since each call is synchronous, the time it takes for the user’s request is the time it takes to process it on the web server plus the processing time on the other servers and the network latency. In order to really have a real impact on response time, we have to improve all the servers and not just one.
If any service is down (for example, billing or delivery) no sale can be processed, which is not a good business strategy. The services should be independent.

To tackle our next integration strategy first we will review some concepts.

Distributed transactions

Transactions are used to preserve data consistency. They are generally used at the database level.

When we are dealing with a distributed system, simple database transactions can’t guarantee the atomicity and consistency of operations that spans more than one service. In these cases we need to use distributed transactions.

For example, if we buy a tourist package, we must guarantee that a hotel can be reserved in the hotel reservation system, a flight in the air ticketing system, and a car in the vehicle reservation system. All 3 reservations must be made or the package must be cancelled.

Distributed transactions, and also RPC, are discouraged due to the problems they introduce.

For example, in the case of the reservation of tourist packages that we saw previously, when there is a sale of a package, a lock should be made for the hotel room in the database. That lock is maintained while the flight is reserved. Keeping the lock for so long (several seconds or minutes) means that connections to the database must also be kept open for a long time. These connections start to accumulate as users purchase packages.

At some point the databases start to reject new connections and the locks start to fail. This brings serious scalability problems because it puts an upper limit on the number of concurrent users that our system supports.

In addition to the technical problems, this also brings a serious problem at the business level: we are not allowing users to buy packages and have incomes for technical details, profits are lost.

Another issue is partial availability. What happens if the hotel system works but the air ticketing system is down? With the distributed transactions you should abort the entire operation and return the room to the hotel. At a business level, it is not necessary for both reservations to be made at the same time, you could retry reserving the flight later.

Eventual Consistency

Bounded Contexts do not have to be immediately consistent with each other. A user may have updated her address in one context but other contexts might still have her old address.

The sales context may have a purchase but in another context the purchase has not yet been completed because the flight was not reserved.

Distributed transactions can be avoided by introducing eventual consistency. We allow hotel and flight to be booked later when the system is online again.

The system as a whole is in an inconsistent state, but at some point this consistency arrives (hence the name eventual). In general modern distributed systems are never in a consistent state, there is always some part that still has old information.

In real life consistency is usually eventual. We move to another address but our passport and other systems still have our old one.

Reviewing these concepts we are now ready to introduce the latest integration strategy.

Integration via messaging in a distributed system

The most commonly used strategy in modern systems that solves all the problems seen above is to integrate Bounded Contexts through event-based messaging solutions.

This approach is based on the principles of reactive programming which replaces RPC with asynchronous messages.

Let’s review again the RPC example and see how it would be modified if we use messaging instead.

Integration via RPC in a distributed system

The version with messaging would look like this:

Integration via messaging in a distributed system

Let’s review the flow:

An HTTP request for a new order arrives. This request synchronously calls the Sales Bounded Context. This context stores the new order in its database and then immediately sends the HTTP response informing the user that the order was successfully received (but not yet processed).
Asynchronously, the sales context sends a message to the billing context to invoice the new order (it can be a command or an order created event).
The billing context communicates with the payment provider synchronously (but without blocking sales or the user’s original request) and makes the payment.
After the payment is made, the billing context sends a new payment event to the shipping context. It uses the third party service to coordinate the delivery (this communication is usually synchronous) and then publishes an asynchronous event that the sales context receives.
Finally, the sales context considers the order is processed and sends an email to the user to let it know that their order was processed successfully.

To solve resilience problems (for example, a service being down) message queues are used. The requests or events are enqueued as messages in a queue, each service pulls new messages and process them. If a service is down, it resumes pulling messages when it comes back online. The entire system works even if certain services are not available.

The performance problems are solved by introducing asynchronism. Services should not wait for their dependent services to finish processing requests.

Services can also be scaled independently based on the usage needs of each one. You can upgrade your web server (or add more web servers) without changing your billing server. The billing server should only be modified according to the frequency of messages that enter its queue. The scalability needs are different for each context (the web server scales based on the number of users that are browsing the site, the billing server based on the number of invoices that are issued, etc.). This allows for a more cost-effective strategy.

When bottlenecks appear in the system we have more options to solve them. We may upgrade the hardware or add more instances of any service independently.

There are 2 types of scalability:

Vertical (or scaling up): It is achieved by improving the hardware (CPU, RAM, disk, etc) of a server.
Horizontal (or scaling out): It is achieved by distributing the load on more servers, adding more instances.

Messaging comes with its own problems:

It is more difficult to debug and track errors across different contexts and servers.
It has more code indirections, making it difficult to understand the flow of each business process.
Eventual consistency requires additional effort to achieve, it can generate errors if mishandled, and sometimes users (and the business) expect it to be atomic rather than eventual. You usually need to modify the UX accordingly.
The complexity of the system increases by requiring more infrastructure components to deliver and retry messages.

Conclusions

As we could see, there are different ways to integrate Bounded Contexts, each with its own benefits and cons. The context of each system and the kind of business must be taken into account to choose the best trade-offs. Applying a distributed system that allows great scalability and resilience in a startup that is starting and validating its business idea introduces unneeded complexity, slows development and limits flexibility when having to pivot (something very common in startups). It is better in these cases to start with a well-modularized monolith, applying good techniques and programming principles that allow the solution to evolve towards a distributed system when business growth need it.

In the next post we will discuss the internal architecture of Bounded Contexts introducing some concepts that will allow us to control coupling and modularization. This will provide us with the flexibility to easily evolve our design and have more options open when making technological and integration decisions.

DDD Series: Bounded Context Integration II - Technical Integration

Bounded Contexts are autonomous

Code level integration

Database-level integration

Using physical barriers to integrate Bounded Contexts

Integration via database in a distributed system

Integration via files in a distributed system

Integration via RPC in a distributed system

Distributed transactions

Eventual Consistency

Integration via messaging in a distributed system

Conclusions

Next post