Friday, February 01, 2013

How to Write Scalable Services


I’ve spent the last five years implementing and thinking about service oriented architectures. One of the core benefits of a service oriented approach is the promise of greatly enhanced scalability and redundancy. But to realise these benefits we have to write our services to be ‘scalable’. What does this mean?

There are two fundamental ways we can scale software: 'Vertically' or 'horizontally'.

  • Vertical Scaling addresses the scalability of a single instance of the service. A simple way to scale most software is simply to run it on a more powerful machine; one with a faster processor or more memory. We can also look for performance improvements in the way we write the code itself. An excellent example of company using this approach is LMAX. However, there are many drawbacks to the vertical scaling approach. Firstly the costs are rarely linear; ever more powerful hardware tends to be exponentially more expensive and the costs (and constraints) of building sophisticated performance optimised software are also considerable. Indeed premature performance optimisation often leads to overly complex software that's hard to reason about and therefore more prone to defects and high maintenance costs. Most importantly, vertical scaling does not address redundantcy; vertically scaling an application just turns a small single point of failure into a large single point of failure.

  • Horizontal Scaling. Here we run multiple instances of the application rather than focussing on the performance of a single instance. This has the advantage of being linearly scalable; rather than buying a bigger, more expensive box, we just buy more copies of the same cheap box. With the right architectural design, this approach can scale massively. Indeed it's the approach taken by almost all of largest internet scale companies: Facebook, Google, Twitter etc.. Horizontal Scaling also introduces redundancy; the loss of a single node need not impact the system as a whole. For these reasons, horizontal scaling is the preferred approach to building scalable, redundant systems.

So, the fundamental approach to building scalable systems is to compose them of horizontally scaled services. In order to do this we need to follow a few basic principles:

  • Stateless. Any services that stores state across an interaction with another service is hard to scale. For example, a web service that stores in-memory session state between requests requires a sophisticated session-aware load balancer. A stateless service, by contrast, only requires simple round-robin load balancing. For a web application (or service) you should avoid using session state or any static or application level variables.

  • Coarse Grained API. To be stateless, a service should expose an API that exposes operations as a single interaction. A chatty API, where one sets up some data, asks for some transition, and then reads off some results, implies statefulness by its design. The service would need to identify a session and then maintain information about that session between successive calls. Instead a single call, or message, to the service should encapsulate all the information that the service requires to complete the operation.

  • Idempotent. Much scalable infrastructure is a trade-off between competing constraints. Delivery guarantees are one of these. For various reasons it's is far simpler to guarantee 'at least once' delivery than 'exactly once'. If you can make your software tolerant of multiple deliveries of the same message it will be easier to scale.

  • Embrace Failure. Arrays of services are redundant if the system as a whole can survive the loss of a single node. You should design your services and infrastructure to expect and survive failure. Consider implementing a Chaos Monkey that randomly kills processes. If you start by expecting your services to fail, you'll be prepared when they inevitably do.

  • Avoid instance specific configuration. A scalable service should be designed in such a way that it doesn't need to know about other instances of itself, or have to identify itself as a specific instance. I shouldn't need to have to configure one instance any differently than another. This would include communication mechanisms that require messages to be addressed to a specific instance of the service, or some non-convention based way that the service was required to identify itself. Instead we should rely on infrastructure (load-balancers, pub-sub messaging etc.) to manage the communication between arrays of services.

  • Simple automated deployment. Have a service that can scale is no advantage if we can't deploy it when we are close to capacity. A scalable system must have automated processes to deploy new instances of services as the need arises.

  • Monitoring. We need to know when services are close to capacity so that we can add additional service instances. Monitoring is usually an infrastructure concern; we should be monitoring CPU, network, and memory usage and have alerts in place to warn us when these pass certain trigger points. Sometimes it's worth introducing application specific alerts when some internal trigger is reached, such as the number of items in an in-memory queue, for example.

  • KISS - Keep It Small and Simple. This is good advice for any software project, but is especially pertinent to building scalable resilient systems. Large monolithic codebases are hard to reason about, hard to monitor, and hard to scale. Building your system out of many small pieces makes it easy to address those pieces independently. Design your system so that each service has only one purpose and is decoupled from the operations of other services. Have your services communicate using non-proprietary open standards to avoid vendor lock-in and allow for a heterogeneous platform. JSON over HTTP, for example, is an excellent choice for intra-service communication. Every platform has HTTP and JSON libraries and there is abundant off-the-shelf infrastructure (proxies, load-balancers, caches) that can be used to help your system scale.

This post just gives a few pointers to building scalable systems, for far more detailed examples and case studies I can't recommend the High Scalability Blog enough. The Dodgy Coder blog has a very nice summary of some of the High Scalability case studies here.


Travelling Greg said...

One thing that should really be considered here is:

Don't scale *your* service try to scale commoditized things. An example of this is scaling reverse proxies instead of scaling a restful service.

Karl Nilsson said...

For me the benefits of the LMAX system isn't necessarily the performance they can achieve (although it is what is mostly talked about) but more about how it makes things really simple by restricting the environment in which state transitions occur to a single thread.

If you have to have state and this needs to be transitioned in a predictable manner it makes sense to serialize interactions. This predictability allows your stateful service to have redundant back-up copies.

The EIP 'Process Manager' (stateful workflow) for example can often be made simpler and more reliant if restricted as above.