查找未被容器使用的docker镜像

需求

项目通过docker容器化后,由于种种原因,项目部署后存在未被容器使用的镜像,占用磁盘空间。因此,需要查找未被容器使用的docker镜像,并将它们从部署脚本中去除。项目中容器数量庞大,手工查询费时费力且易出错,故寻求自动化方式查找出未使用镜像

分析

  • 最好能通过docker自身命令查找出未使用的容器。
    docker images --filter dangling=true能查找出untagged images,如下
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
                            8abc22fbb042        4 weeks ago         0 B

但实际项目中未使用的docker镜像大多是有tag的,只是镜像加载后没有使用而已。故通过docker images --filter dangling=true无法查找出全部未使用的镜像

  • 自己写脚本解决
    1.docker ps -a获取所有容器信息,其中包含它们所使用的镜像名称
    2.对所有容器,docker history container_id,查找到它们的基础镜像(由于docker镜像的分层复用特性,下层的基础镜像不会占用额外的空间,故无需清理)
    3.对1/2中所有的镜像取并集,并去重
    4.docker images查找出所有镜像,并排查3中得到的镜像,便是未使用的镜像

python脚本

find_unused_images.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re

IMAGES_COMMAND = 'docker images'
PS_COMMAND = 'docker ps -a'
HISTORY = 'docker history %s'
RE = re.compile(r'\s+\s+')


def exec_command(command):
    result = os.popen(command)
    return result.readlines()


class Image(object):
    def __init__(self, image_info):
        self.split_info = RE.split(image_info)
        self.image_id = self.split_info[2]
        self._generate_image_name()
        self.size = self.split_info[-1]

    def _generate_image_name(self):
        tag = self.split_info[1]
        repo = self.split_info[0]
        self.name = repo if tag == 'latest' else repo + ':' + tag

    def get_related_images(self):
        image_ids = [RE.split(history)[0] for history in exec_command(HISTORY % self.image_id)[1:] if
                     RE.split(history)[0] != '']
        return filter(None, [ImageUtil.get_image_by_id(image_id) for image_id in image_ids])

    def __repr__(self):
        return 'id:%s name:%s size:%s' % (self.image_id, self.name, self.size)

    def __hash__(self):
        return hash(self.image_id)

    def __eq__(self, other):
        return self.image_id == other.image_id

    def __ne__(self, other):
        return not self.__eq__(other)


class ImageUtil(object):
    all_images = [Image(image_info) for image_info in exec_command(IMAGES_COMMAND)[1:]]

    @classmethod
    def get_image_by_id(cls, image_id):
        try:
            return filter(lambda img: img.image_id == image_id, cls.all_images)[0]
        except IndexError:
            return ''

    @classmethod
    def get_image_by_name(cls, name):
        try:
            return filter(lambda img: img.name == name, cls.all_images)[0]
        except IndexError:
            return ''


class Container(object):
    def __init__(self, container_info):
        self.split_info = RE.split(container_info)
        image_name = self.split_info[1]
        self.image = ImageUtil.get_image_by_name(image_name)


class ContainerUtil(object):
    all_containers = [Container(container_info) for container_info in exec_command(PS_COMMAND)[1:]]

    @classmethod
    def get_used_images(cls):
        used_images = [container.image for container in cls.all_containers if container.image]
        related_images = []
        for used_image in used_images:
            related_images.extend(used_image.get_related_images())
        used_images.extend(related_images)
        return used_images


if __name__ == '__main__':
    all_images = ImageUtil.all_images
    images_used_by_container = ContainerUtil.get_used_images()
    unused_images = set(all_images) - set(images_used_by_container)
    print('unused images')
    for image in unused_images:
        print(image)

依赖

  • python2.7(python3.x略作修改也可使用)
  • root用户

用法

将find_unused_images.py脚本放至任意目录,执行

python find_unused_images.py

便可打印出所有未使用的镜像以及它们的大小

unused images
id:e111a70eee6a name:celery size:216MB

你可能感兴趣的:(查找未被容器使用的docker镜像)