I originally wrote this piece as of part of a paper evaluating container technology for a client.
This document describes container technology, best represented by Docker. Containerization is a game changing technology that’s experiencing rapid adoption. Some measures have around 25% of companies now using Docker in some form (https://www.datadoghq.com/docker-adoption/). Containers can dramatically simplify the software development process, allowing companies to be more agile and lower the cost of building and maintaining large software systems. This document looks at how containers fit within the general evolution of software systems.
A brief history of software
The history of software development is a story of successive rounds of abstraction and commodification. If you can treat a class of something (a computer, a network or a peripheral) as a black box with a consistent API, it enables common industry wide tooling and commodification.
In the early years of computing software was written for a particular version of hardware. Each program would would take complete control of the machine, use the processor’s physical instruction set, directly address physical memory and have intimate knowledge of the locations and capabilities of any devices attached to the machine. This meant that a program written for one model of machine would not work on a different model. Machines were typically sold with a dedicated software suite, which meant that the same classes of software had to be written repeatedly for each machine. In the early days of home computers it was typical for a word processor, for example, to come in different versions for all the major machines on the market and with drivers for a range of popular printers. If your printer wasn’t included it wouldn’t work.
To solve this problem and allow a single program to run on a variety of machines, operating systems were created to provide an abstraction layer over the underlying hardware. So long as a piece of software was designed to run on the operating system of your computer, it worked. The operating system also isolated the program from variations in peripheral hardware. You no longer had to care about what particular printer was attached to the computer because an operating system driver provided a common abstract printer API regardless of the actual hardware model. As operating systems evolved they provided not only isolation from the hardware, but also isolation from other programs running on the same computer with innovations such as protected memory and pre-emptive multitasking. With the adoption of an operating system as a common platform, the thing it abstracted, the hardware, became a commodity. This lead to dramatic cost reductions and economies of scale, both for hardware and software.
The same adoption and standardisation also occurred with networking. TCP/IP became the standard which allowed computer systems to be connected world wide and HTTP has become a standard for sharing data globally. This has allowed software solutions to serve customers at a massive scale.
As software that runs on commodified platforms became more complex, various mechanisms evolved to make software more modular and reusable. Collections of modular software ‘libraries’ could be brought together to create more powerful applications in less time. Software environments also evolved to include runtimes to relieve programmers from the need to manage memory and to further abstract the program from its environment. Software systems also evolved to be composed of multiple processes running on multiple machines to better aid scalability and resilience. Various services and infrastructure tools such as web servers and databases provided off-the-shelf capabilities to further aid software development.
The complexity of the modern software environment
All these libraries, services and infrastructure have to be correctly configured for the software to run. This is often a semi-manual, complex, time consuming and error prone task. When multiple pieces of software run on a single machine there can often be complex and damaging interactions between conflicting library and tool versions. The complexity of provisioning environments, installing tools and libraries of the correct version, opening the correct ports and configuring connections, especially when this is done in different environments with differing network topologies, a fertile environment for human error.
Once in production, these complex systems need to be monitored, managed and audited. This introduces additional tooling and configuration, adding yet another vector for misconfiguration and error.
Also the difficulty of coordinating teams of software developers who create complex software systems requires the formalisation and automation of the software development process. This introduces new tools, such as build and deployment systems that must also be configured correctly for the software to be successfully delivered into production. This configuration work is also often manual, fragile and error prone, and since a single toolset is often shared by many teams and components, it creates significant friction when introducing new services, libraries or tools.
Because the delivery and runtime environments are maintained and versioned separately from source code, this introduces risk and friction. Services often share both environments and delivery processes, meaning that upgrades and changes have to be coordinated. In a worse-case scenario separate teams may be tasked with maintaining the runtime infrastructure and the delivery process, escalating any change to a large scale organisational issue. Often the overwhelming task of synchronising software and environment upgrades means that they are done infrequently and with a great deal of ceremony and risk.
Virtual machines don’t really help here. They can make the work of technical operations easier; they decouple an entire operating system environment from hardware and make it easy to replicate and move environments around hardware infrastructure. However, VMs make very little difference to software developers. The software pipeline and runtime environment is still maintained and configured separately from the the software source code itself.
The stage has been set for another round of abstraction, this time the abstraction is the interface between the operating system, the userland environment and the network topology that the software is built and runs within.
Containerisation is the technology that provides this abstraction and solves many of the problems described above. Containers provide a scripted per-process runtime user environment that is maintained alongside the source code. The software build process and target network topology of a large software system is also defined in container and composition/orchestration scripts. Because the scripts are maintained by developers on a per-process (per service) basis and are maintained under source control alongside the service’s source code, the software describes the environment that builds it and that it runs in, and this description is versioned with the software. Effectively it reverses the usual hierarchy and allows each component to own it’s environment and delivery process. The environment for a component is identical regardless of whether it’s running on the developer’s machine, in a test environment or in production and removes much of the risk of configuring the software pipeline and runtime environment described above. This idea of extending Git workflow to operations is known as GitOps. (see https://www.weave.works/blog/what-is-gitops-really). In the same way that operating systems removed the need for software to care about hardware, so containers allow the software environment to be described without having to know or care about the specific operating system environment and the physical network.
Conclusion
Docker and its various orchestration options offer game changing performance increases for large software organisations. It provides a single, integrated, scripted, scalable platform for both the software delivery pipeline and production operations. It’s experiencing fast adoption and will soon be as standard a part of IT infrastructure as VMs are currently. Any software organisation of reasonable scale should now be seriously looking at a path for adoption.