Container Driven Development
Revolutionizing software development one box at a time
Three important technological trends are causing seismic shifts in the way we develop and deliver software. These trends are:
- The rise of cloud-based computing: Through the use of virtualization and intelligently defined application programming interfaces, it is nearly effortless to deploy computing resources designed to tackle difficult problems.
- The adoption of microservice architectures: By placing the emphasis on small programs that focus on doing one thing well and combining those programs into larger units to deliver more complex functionality, it is possible to create sophisticated systems while increasing the pace of development.
- The emergence of DevOps practices allows companies to be more innovative and compete more effectively:
DevOps is a "combination of tools, practices, and philosophies that increase an organization's ability to deliver applications and services at high velocity..."
- Amazon
While each individual trend has caused disruption in the tech industry, their convergence in the form of containers is where the greatest shakeup has occurred. In this article we'll look at some of the reasons why. We'll delve into the challenges that containers solve, the current state of the art, and the broader ecosystem which has evolved around them.
Rise of Container Development
Container driven development is a development workflow that writes, runs, and tests code inside a containerized environment. A container is a portable packaged file that contains an application (often structured as a microservice) along with its source code, libraries, assets, and any other dependencies required to execute properly.
The idea of simplifying a complex application down to a container provides a number of benefits. These benefits include:
- consistent deployment and execution
- application portability where the same artifact can be used in development, staging, and production
- ways to store, transport, and deploy applications across a variety of runtimes
Containers solve a number of challenges in software development. In a traditional workflow, development would consist of writing the program, building the system, testing the resulting artifacts, and then packaging the components in such a way that they can be deployed. Many steps in the process often require manual processes that can be time-consuming and prone to error.
Benefits of Using Containers for Development
The usage of containers remedies many problems by bringing a degree of formality to the build and packaging of the application, creating a single artifact that can be used for testing, and providing a consistent artifact that can be used in development, staging, and production. Because the entire process provides parity to where the program will be deployed, issues and bugs related to differences in environment are minimized. Additionally, automated processes can be built around the build, testing, and deployment; which allows for software to be released at a faster rate.
Isolation
Containers virtualize CPU, memory, storage, and network resources at the OS-level. Processes running inside one container are not visible to processes running inside of another. This provides two major benefits: sensitive processes can be isolated into one logical sandox, invisible to other processes running on the machine. Additionally, the logical isolation provides developers with a sandboxed view of an OS that is isolated from other software which might also be deployed. This isolation helps simplify the runtime environment and prevent unintended interactions between similar applications that might be using the same host.
Start from a Clean (Runtime) Slate
A common cause of software bugs is differences in environment. A developer may have one set of dependencies installed on a local workstation (often accumulated through months or years of working on an application), staging may have a slightly different set, and production a third set. Additionally, when code is executed, cache files are generated and become part of the environment and may contribute to subtle interactions that cause difficult-to-diagnose bugs.
Due to the way containers are built and deployed, they always begin their execution from a consistent starting point. When the container is terminated and removed, runtime artifacts are removed on cleanup. The process works consistently in development, staging, and production; greatly reducing the likelihood of "deep bugs."
Consistent, Well-Defined Environments
While containers and their template images can be created interactively by committing changes as an image, it is far more common to build them using a recipe called a Dockerfile. Dockerfiles can reference other container images (called base images) and then extend them with additional configuration and components. Because the file contains the exact specification of the container, it is possible to certify that all images built from the same Dockerfile will function identically. Images can be versioned, which means that it is possible to precisely define the components of a complex application spanning multiple containers and ensure that all components work as expected.
Building new container images is often automated and tied to events such as code check-in. This allows for the creation of larger processes, such as continuous integration and deployment pipelines, that can greatly speed up the deployment of new software versions after they have been rigorously tested.
Delivery and Deployment
Container-native development not only improves developer productivity, but also helps organizations standardize operations and processes. When applications are packaged in containers, it becomes much easier for operations teams to move a tested and verified program version to staging, and then on to production. The startup and entry for each container is consistent, meaning you know the container will initialize, allocate resources, pass configuration values, and manage artifacts the same way every time. This makes it possible to automate key portions of the deployment pipeline, and can allow for software to be released quickly (perhaps even multiple, or dozens of times per day).
The Evolution of Containers
Container technologies build upon a long history of improvements in software packaging and process isolation
Challenges of Packaging Software
Historically, software was provided as source files which needed to be compiled before they could be used. To install a new program, a user needed to:
- Retrieve an archive of source files
- Extract the files and generate a build configuration (for Unix systems using autotools, this was typically done by running
./configure
) - Run the generated
Makefile (make)
- Install the resulting software build
make install
This procedure worked well for simple software, but for larger programs with a number of dependencies, the process could be very difficult. Issues such as missing resource files or incompatibilities between software versions might require hours or even days to resolve.
Package Managers: Isolating and Mapping Dependencies
Packages were invented to combat this complexity and represented one of the first steps in software management and dependency isolation. They also provide the foundation for container environments. A package is an archive that has binaries of software, configuration, files, and dependency information. Inside each package there is metadata that contains information about the software's name, description, version number, vendor, and dependencies that are necessary for the program to install and run correctly. The binaries included in a package are compiled by maintainers of large systems and are usually tested before distribution to ensure that the environment is "sane." In a well packaged environment, software tends to "just work."
Many operating systems and programming languages come with package managers. Package Managers (PMs) automate the process of installing packages and resolving dependencies and managing software versions. They provide the foundation of creating a runtime environment. However, despite their sophistication and role in modern software ecosystems, they are limited when compared to containers (which build upon their model). One shortcoming that containers address, for example, is allowing multiple applications to share common components.
Applications often rely on the same library but use different versions, which makes it tricky to package shared libraries. While sharing dependencies is desired to keep the footprint of the platform small, it can cause conflicts requiring programs that consume the library to each include a copy, greatly increasing the size of install. When shared outside of a container, upgrading a dependency on one library could have destructive effects on others. If such a dependency issue arises, the resulting environment conflicts can be painful and very time consuming to fix.
Containers solve this problem by allowing each container image to have its own set of dependencies (when needed), and share lower-level base images when possible. This simultaneously keeps the install size small, while also allowing for custom versions of the libraries as needed.
Virtualization: Isolated Runtime Environments
One of the next significant steps toward isolated software environments came in the form of virtualization. In a virtualized system, hardware is emulated by software. This means that a single physical machine can run dozens or hundreds of "virtual machines" or "virtual appliances" each one with its own software stack. This approach solves many of the problems that can arise from running different versions of a program on a single machine. Virtualization allows for the isolation of dependencies and processes, effectively creating miniature "appliances" that can be built and deployed across a cluster of physical machines.
Unfortunately, while virtualization introduces many benefits, it also comes with a performance penalty. Each virtual machine has its own kernel, memory allocation, and init system. Further, because the CPU is emulated, there is overhead due to passing instructions to the underlying hypervisor for execution. While good for consistency and isolation, and an effective way to manage complexity, virtualization is not terribly efficient.
Containers extended the virtualization model by keeping many of its benefits (such providing network as a software service), but sharing the underlying kernel and init components. This removes much of the overhead, allowing for many more processes to be run on the same hardware. A powerful hypervisor is capable of running hundreds of virtual machines, but tens of thousands of container processes.
Control Groups: Foundation of Linux Containers
A second significant step toward the goal of running isolated processes within a single kernel took a major step forward in the early 2000s, in the form of zoning. Zoning allowed the system to limit a process scope to a specific set of resources. With a zone, it was possible to give an application user, process, and file system space, along with restricted access to the system hardware. Just as importantly, though, it was possible to limit the application's visibility into other parts of the system. Essentially, a process could only see those things within its own zone.
While the initial implementation of zoning was powerful, it failed to gain broad adoption until a second implementation (called "control groups" or "cgroups") became available in the Linux kernel in 2008. The project, broadly called Linux Containers (LXC), provided a kernel extension as a way for multiple isolated Linux environments (containers) to run on a shared Linux kernel, with each container having its own process and network space.
Orchestration
While cgroups provided the foundation, though, they were difficult to utilize. In 2013, Google contributed a set of utilities to an open-sourced project called Let Me Contain That For You (LMCTFY) that attempted to simplify the utilization of containers by making them easier to build and deploy. This project provided a library that could be used by applications to allow containerization through commands and a standard interface, allowing a limited degree of "orchestration."
Even this failed to gain broad adoption, though, because there was no standard way to package, transport, or deploy container components. Even with the improvements that LMCTFY provided, the process was too difficult. This started to change in 2013 with the emergence of a new utility called Docker.
Tooling and Technologies
Docker
Docker is a toolset that makes it easy to create containers and container "images." Through a simple set of commands: run, build, volume, etc. it provided an easy to understand interface for creating and deploying containers. This allowed it to become the first runtime that popularized containers and succeeded in bringing them to the masses. As it has evolved, it brought together a robust framework for creating, deploying, and managing nearly all components of the container lifecycle. It has become the all-in-one containerization tool that provides the de-facto environment involved in the implementation of microservices.
But while its name has become synonymous with all things containers, Docker is just one portion of the solution. It utilizes Linux namespaces, control group capabilities, security profiles, network interfaces, and firewall rules to isolate processes and provide a seamless container runtime.
Resource Management
While technically a feature of control groups, Docker provides an interface which allows for resource constraints to be applied to a container. This can be used to specify how much memory, CPU, or other specialty resources (such as GPU) can be consumed by the program running in the container. Managing resources allows for many benefits during development: it provides a way to simulate a specialty target environment (such as might be found in an embedded device) with high fidelity, or ensuring that a greedy program doesn't hog the resources of a developer's personal laptop.
Portability
One of Docker's major contributions to the container landscape came in the form of how it builds and stores images. A Docker image contains the environment, dependencies, and configuration required to run a program as part of a read-only file system. The image itself is built up from a series of layers, where each layer represents an instruction (usually provided as a line in Dockerfile) and the results of its execution. Each layer in an image uses a copy-on-write (COW) strategy for storing changes. This allows files to be shared between all layers of the image and provides an efficient way to capture differences. Docker's layer format has become a standard for containers, and gives machines a quick way to exchange images between themselves or between storage servers called registries.
This gives Docker images great portability. Services like Amazon Web Services (AWS), Google Compute Platform (GCP), and Windows Azure have embraced Docker and created managed services around it. These include managed Kubernetes (EKS, GKE, and AKS) and secure registries. Unlike other cloud service, where they may be strong vendor-specific lock-in, a container running in an Amazon Kubernetes instance can easily be ported to run within the Google or Azure equivalent.
Moving container images is a simple command. Using a container ID, it's possible to push a container image to any registry to which the machine or user has access. Registries can be mirrored and incorporated into workflows around deployment, security, or compliance auditing. The image format includes the ability to associate metadata and version tags, so it is possible to track changes and use the build system to troubleshoot bugs.
Consistency and Portability Leads to Improved Productivity
Because of their consistency and portability, Docker containers are easy to integrate into many development workflows. For developers building an application that require supporting components (such as database, message queue, or backing microservices), creating linked containers that provide the associated systems is easy. Similarly, it is also straightforward to create a development environment within a Docker container that mimics the production target and allows developers to mount local source files into that for work. With standardized/repeatable development, build, test, and production environments, you can rely on your containers to do what they are supposed to do every time and still provide convenient development access.
Continuous Integration and Deployment
When development occurs in a container environment, the entire lifecycle of the application can be production-parallel. This means there is often a straight shot from the time an issue is fixed to deployment. Continuous integration and continuous deployment (CI/CD) is a popular DevOps practice of automating the application build, testing, staging, and deployment processes. Many CI/CD pipelines utilize Docker at their core.
Docker's Components
The Docker runtime include the client engine (server), network, volume, and image tools.
Client
The Docker Client is used to manage containers, images, networks, volumes, and other resources through a command line interface (CLI). This command-line-based docker
connects to the server components and issues commands to a RESTful API that the server exposes. This client is the primary interface used by most users.
Engine/Server
The Docker Server (deployed as a system daemon called containerd
) handles the administrative work of creating and managing containers, transferring images, creating networks, configuring volumes, and allocating system resources.
Registry
The Docker registry is a public or private repository of container images that can be accessed with the Docker client (or any automation tools that are able to read the registry API). Docker runs a public registry called Docker Hub. Docker Hub includes thousands of publicly available images comprising operating systems, utilities, databases, and application servers.
CoreOS Rkt
While Docker has become the de-facto standard for Linux containers, it is not the only container technology in use. CoreOS Rocket (Rkt) is a direct competitor providing similar functionality. Rkt, released in 2014, attempts to focus on high-security environments and use cases which Docker's architecture does not address well.
One of CoreOS' developers, Alex Polvi, explained the motivations for building a secure alternative to Docker as:
“From a security and composability perspective, the Docker process model – where everything runs through a central daemon – is fundamentally flawed. To ‘fix’ Docker would essentially mean a rewrite of the project, while inheriting all the baggage of the existing implementation.”
Security
From its outset, Rkt chose to pursue a security model and architecture that would allow for Rkt containers to run without administrative privileges. This design makes it possible to apply security constraints that would be difficult or impossible to apply in Docker. Over time, it has been enhanced with additional security features such as deep integration with SELinux, TPM, signature validation, KVM-based container isolation, and privilege separation. Because of this, Rkt has become the de-facto tool for security-focused environments even though its feature set is less robust than Docker's.
Applications Organized as Pods
The core execution unit of Rkt is a pod, which is a collection of one or more applications that share resources. Kubernetes also uses pods as its core construct. By grouping applications together, it becomes possible to stack related components and map pods to cluster management groups.
The way that Rkt executes containers directly maps to the Unix process model, but in a self-contained, isolated environment. This means that containers take a predictable location in the PID hierarchy of the host and can be managed with standard utilities.
Compatibility
Rkt is compatible with Docker images. This gives it instant access to the complete library of applications available in Docker Hub. It also allows for granular adoption of the technology. If there are well-established development processes built around Docker, it would be possible to integrate Rkt into staging and production while keeping development workflows untouched.
Empowering Better Process and Technology
Containers provide many compelling features for development teams. These include a unit of packaging, a unit of deployment, a unit of reuse, a unit of resource allocation, and a unit of scaling. In essence, it provides the perfect toolset of developing and deploying microservices.
Containers also provide great value to operations teams. Because they can provide many of the benefits of virtualization, but without the overhead, they provide the capacity to enable greatly improved processes. In many important ways, though, they have also shifted the thinking about how software should be developed and deployed and enabled new computing platforms.
DevOps is a combination of tools, practices and philosophies that increases an organization's ability to deliver applications and services at high velocity. DevOps, as practiced at most software companies, incorporates containers.
Infrastructure as Code
Infrastructure as Code is the practice of using the same tools and methods leveraged for software development in the management of infrastructure and computing resources. At a practical level, this means the application configuration will be kept in version control, analyzed, and tested.
Containers provide a concrete implementation of Infrastructure as Code. At the most basic level, Dockerfiles create application runtimes that can then be composed together through the use of orchestration manifests and deployed into cohesive resource groups. In more sophisticated scenarios, container systems are able to request specialty resources such as GPU and specialty forms of storage.
Continuous Integration / Continuous Deployment (CI/CD)
CI/CD is the practice of developing automated suites that can test, validate, stage, and deploy code. With an effective CI/CD pipeline it is possible to deliver application updates with very high frequency. Companies such as Netflx, Amazon, Etsy, and Google are often able to update a deployed system dozens of times per day.
A rich ecosystem of tools designed to enable container based CI/CD exists. Jenkins, as an example, is a CI automation tool that can automatically create containers from source code, run test suites against them to ensure their functionality, and deploy the resulting containers to a registry if they pass the tests. Spinnaker is a CD tool capable of watching a registry for updated container builds and deploying them to a staging or production cluster.
Containers have allowed for "cloud" computing to evolve toward greater abstraction. Taken to its most extreme, a fully abstracted cloud platform removes all details of how a program is packaged and provisioned. From a developer's perspective, it just gets executed.
Serverless Functionality
Containers are capable of providing precisely this type of computing model. Commonly called "serverless" or "Functions as a Service," such systems allow a developer to write a simple piece of code and deploy them to a function platform, which then handles the jobs of deployment, scheduling, and execution.
Open-FaaS is an open-source serverless framework that allows users to deploy their functions in containers within a Kubernetes cluster. Open-FaaS allows users to develop their application with any language (as long as the app can be containerized), and provides a standard model to handle their execution and scaling. The functions can then be combined together to enable powerful applications and pipelines.
The Container Revolution
Whether using Docker or Rkt for container driven development, the consistency, portability, and isolation provided by containers enable more rapid delivery of software. Containers are an enormously powerful development tool capable of providing tremendous value to individual teams and entire organizations.
Comments
Loading
No results found