Most businesses run on a suite of diverse applications. Some have been written in house, some have been written by outside suppliers and some are bought as shrink-wrapped products or COTS (Commercial Off The Shelf). They run on different platforms using different technologies. They range in scale from Excel spreadsheets, created and used by non-technical business users, to large scale enterprise applications based on server farms and created by teams of IT professionals.
For example, one of my recent clients used Sage for their accounts, Salesforce.com for the sales team, and RedDot for content management. They also had a suite of applications written in-house. The older ones were VB/SQL client-server systems, with the more recent web based ones being written with .NET. There were also many business processes based on spreadsheets maintained by individual users. Some processes’ data even lived on whiteboards and paper.
The problems that arise from such a disparate suite of applications are well known; you can find them in most enterprises.
- Users often have to re-key data, copying it from the screen of one application to input fields in another.
- The organisation has more than one source of the same information. For example, a customer’s address might be different on different systems.
- Because there is no single source of information, important details are often missed. A customer might be given preferred status on one system, while they are marked as having a legal dispute with the company on another.
- It is very hard to gather management information. We might not be able to say how many customers with orders above a certain amount always pay late.
- Business processes are manual and add-hoc. They can be lost when a key member of staff moves on. They are often carried out differently at different times by different people; confusing customers, creating inconsistent data and introducing ‘bugs’ into the system.
So what do we do to solve these problems? We try and make the disparate applications talk to each other.
Integration Pathologies
But making the applications talk to each other introduces a whole new set of problems. Most naïve approaches fail primarily for two reasons:
The first is not correctly decoupling the applications. The communication implementation involves the applications knowing far too much about each other’s innards. The most pernicious (and common) form of this is when one application directly accesses another’s database. This reaches an apogee of awfulness when stored procedures execute cross database joins. Multiple applications accessing shared business components can also be a sign of this pathology.
This style of communication quickly becomes a tightly coupled mess. It can very easily stop the possibility of any changes being made to the individual applications as developers recoil against the complications of refactoring all the known (and often unknown) disparate pieces that rely on schemas, stored procedures and other internals. “We can’t touch system-x because who knows what might break”.
In the worst cases, such as when the integration targets the tables of shared databases, business rules can be replicated many times over. This makes it extremely difficult to change them and adds to the forces fossilising the organisation’s software.
The second common pathology is direct application-to-application communication. This is a natural consequence of doing application integration ad-hoc. If we think of an application as a node in a network, it’s easy to see that each additional node requires a new set of connections equal to the number of existing nodes. The task of integration will get successively more complex as we add new systems that require integration.
Each connection requires its own mapping and access to the joined application’s internals. This leads to duplication of effort and business rules.
Before long we find that our integrated applications, rather than creating the wonderful joined up business we envisioned, have made things even worse. The integration effort itself takes up progressively more resources, the tight coupling between applications makes it very hard to change anything and the duplicated business rules and diverse mappings mean that we face an avalanche of poor, inconsistent data.
How can we avoid this?
The industry has built up considerable experience in enterprise integration patterns and SOA. My two favourite sources of wisdom are the book Enterprise Integration Patterns by Hohpe and Woolf and the accumulated advice found in Bill Poole’s excellent blog.
The trick to doing integration successfully is to connect the applications in such a way that they remain decoupled. This is the core secret to doing Service Oriented Architecture well. We decouple applications by hiding them behind well defined service interfaces. We never allow them to interact directly with each other’s internals.
By hiding each application behind a well known interface, we can allow it to change internally without having those changes propagate throughout the organisation. Each application can remain agile and responsive to business needs.
We control the proliferation of connections by making each application talk through a common Enterprise Service Buss (ESB). We then have only one interface to worry about for each application. We make each application talk a single canonical language that is shared throughout the organisation. Now we only have to manage one mapping per application, between the application’s internal representation and the canonical message schema.
The canonical language should consist of messages that are relevant to the business process. We should avoid service interfaces that exhibit CRUD style APIs and instead build an event driven ESB that exchanges coarse grained messages styled as ‘business events’. Changes that are relevant to the organisation are published by applications where the change is sourced, and subscribed to by applications that need to know about the changes.
Messages should be asynchronous and atomic. We need to avoid the situation where one application needs to synchronously call another application to source some data in order to complete an operation. A message should carry all the information needed to complete a business event and messages should not be enrolled in transactions.
We should not concern ourselves with duplicated data between applications, so rather than having a single list of countries, for example, held in a single service, we should be relaxed about duplicate lists held in each application. See Bill Poole’s posts on Centralised vs. Decentralised Data here, here and here.
Integration is hard
Even with all these principles in place, integrating business applications is still a complex task. It is not to be undertaken lightly. We need a relentless business driven focus on integration design and a clear and well communicated vision of how to achieve it. This is doubly hard when we try to disentangle an existing web of poorly designed tightly coupled interactions as described above. However, done well it can be the springboard to a far more flexible organisation.