Container technology isn't new. Linux Containers (LXC), which combine kernel control groups (cgroups) to isolate a process's resources and support isolated namespaces, have been around for at least 10 years. Until recently, however, the technology was primarily used by highly technical IT professionals who weren't truly using LXC because other virtualization technologies made operating-system-level kernel containers redundant. Then Docker came along several years ago and turned things around. It took a technical capability of the operating system and made it work for developers, finally delivering on the promise of application portability.
Portability sells
This was Docker's revolution. Developers jumped on this new opportunity, use skyrocketed, and the use of containerization exploded. Whereas applications once were contained in a single address space on a limited set of application servers, they now run as a set of microservices in a cluster of containers, with the network as the application fabric through which the microservices communicate to drive business results. But migrating to this kind of application architecture presents three challenges:
Orchestration: Deploying your application or service as a cluster of containers
Security: Having your app distributed as many microservices where each one runs in its own container
Monitoring: Ensuring that all your microservices are running well to keep providing the services you've promised your customers
These challenges have spawned a world of new solutions to support orchestration, security, and monitoring at an unprecedented scale.
Here's how to meet the challenge of monitoring a Docker-centric environment to ensure that workloads remain healthy and at peak performance. I'll also cover the five layers of monitoring you should engage in and the top vital signs to monitor at each layer to ensure your application is running well.
The challenge
One challenge is the frequency at which microservices are updated. As a small, distinct piece of functionality, a microservice has a short lifecycle; it may undergo frequent updates and be replaced each time. The notion of applying an application patch has been replaced by deploying new versions of the relevant microservices. The many orchestration, clustering, and native cloud-deployment services that have sprung up facilitate this rapid rate of change. But as your services scale up, your monitoring needs to ramp up to support them. You must constantly listen to the heartbeat of your application's environment to give you an accurate picture of how it's operating.
Another challenge is managing multiple versions of the same microservice. The process of replacing a microservice is not atomic. Your production environment may be running several instances of a microservice at any time to provide load balancing and scalability. When introducing a new version, you automatically phase in new instances, reroute network traffic to them, and phase out the old instances. That means there are periods of time in which both the old and new versions are running concurrently, so your monitoring system must be able to differentiate between them. If a failure is detected, you need to know if it's because of faults in your newly introduced revision or a bug in the old version you're replacing.
Different approaches to monitoring
In one approach to monitoring, the orchestration layer updates the monitoring system about changes. For example, you may need to change the management layers of your monitoring system to include a new component. But this approach doesn't adequately scale to cope with the high rate of change you get in a clustered system of containers. A better approach is automatic discovery. You need your monitoring system to be agnostic, to change, and to adjust automatically to new microservices you introduce. This adaptive approach is much better suited to monitoring a frequently changing cluster of containers.
Five levels to monitor
To adequately gauge the health of highly distributed clustered microservices running in Docker containers, you need to monitor your application on each of the five levels listed below. Here are the vital signs you need to monitor to detect health conditions at each level.
1. The cluster manager
The cluster manager manages the lifecycle for a cluster of containers as one execution machine. Docker Swarm is a native Docker option, but there are others, including Kubernetes.
Vital signs to look for:
Is the cluster manager up and running and in a healthy state?
Are all nodes connected as expected?
2. The cluster nodes
These are the compute units or virtual servers managed by the cluster manager.
The top metrics include:
CPU utilization—None of the nodes should be using more than 90 percent of the CPU
Free memory—All nodes should have at least 10 percent of free memory
Swap space used—Nodes should use no more than 90 percent of the allocated swap space
File disk space free—Make sure that free disk space stays above five percent
While the exact numbers shown above can vary, it's important to monitor these metrics and define the right alert levels in your implementations.
3. The Docker daemon
Ensure that the daemon running on each node is healthy and properly managing the container running on the node. To determine this, make sure the Docker daemon is up and running at all times.
4. The Docker container
Since your microservice runs inside a Docker container, you need to ensure that the container is always up and running.
The top metrics here include:
CPU utilization—Watch for actual CPU utilization rates above 95 percent of the allocated CPU
Memory utilization—Create an alert for when memory utilization exceeds 90 percent to avoid maxing out allocated memory
Network I/O—Monitor the network I/O for abnormal network activity
5. The microservice itself
The microservice is the workload that runs within the container. This one is a bit tricky because each proprietary microservice will have its own monitoring interfaces and measures of health. However, if your container is running code within a common framework, that framework may provide standard ways to gauge whether or not your microservice is running well. For example, you can scan the Docker file to automatically detect common services, such as Node.js, Postgres, or RabbitMQ that are specified within the file. You can then monitor a standard characteristic of that service.
For example, if you know that Postgres is running in your container, you can feed it a stream of test data to make sure that it's working correctly. While you can't automatically monitor every piece of proprietary code, you can automatically monitor the common frameworks in which it runs. In this case, your vital signs will depend on the framework you're monitoring. These may range from reading simple metrics using a single API call to sending more elaborate SQL statements.
By using an adaptive approach to monitoring Docker, you can automatically manage the rapidly changing set of containers that make up your Dockerized application. As long as your monitoring system can automatically detect new containers, you can get an accurate picture of health at each of the five levels in the Docker infrastructure hierarchy.