Kubernetes-Docker容器故障排除工具

Docker容器本着轻量化的原则,很多容器镜像在制作的时候只安装了服务需要的软件包

这样做确实能达到容器轻量化,启动快等特点,但是同时也有个问题就是由于镜像中最基本的工具也没安装,导致排查问题是有很大的困扰,尤其是网络问题的排查,很多容器镜像中ifconfig、netstat、lsof、tcpdump等工具都没有

比如dockerhub提供的mysql容器中 netstat、tcp工具都没有,那么像查询mysql连接数时就必须手动安装

Netshoot镜像是一个集成了相关网络工具的镜像,可以通过该工具共享指定容器的网络命名空间来进行相关的操作

现在有一个mysql的容器

[root@k8s-node1 ~]# kubectl --namespace=kube-system get pod mysql-74b59f4f5b-l2zct -owide
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
mysql-74b59f4f5b-l2zct   1/1     Running   0          9d    172.17.39.9   k8s-node1              

进入容器查看,可以看到该容器中网络的工具基本上都没有安装

[root@k8s-node1 ~]# kubectl --namespace=kube-system exec -it mysql-74b59f4f5b-l2zct bash
root@mysql-74b59f4f5b-l2zct:/# netstat
bash: netstat: command not found
root@aistack-mysql-74b59f4f5b-l2zct:/# ifconfig
bash: ifconfig: command not found
root@aistack-mysql-74b59f4f5b-l2zct:/# lsof
bash: lsof: command not found

使用Netshoot可以实现在mysql容器中查看网络相关信息

1. 查看容器所在的节点,通过-owide可以看到该POD调度在k8s-node1节点

2. 查看docker container PID, 取前16位即可,本次查出来的是4b2160c0e96fe44b

[root@k8s-node1 ~]# kubectl --namespace=kube-system get pod mysql-74b59f4f5b-l2zct -ojson | grep containerID
                "containerID": "docker://4b2160c0e96fe44b02dd50e4abf6e9fe7528bbe885dd11a90cd18d6576a2b0ce",

3. 使用Netshoot镜像指定mysql容器的网络重新启动一个容器(--network可以指定网络命名空间),可以看到在该容器中相关的网络工具都有了(虽然是新的容器,但是由于使用的网络命名空间是一样的,所以网络信息是一摸一样的,包括IP,网卡,路由等)

[root@k8s-node1 ~]# docker run -it --rm --network=container:4b2160c0e96fe44b netshoot:latest bash -l
                    dP            dP                           dP
                    88            88                           88
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P
88'  `88 88ooood8   88   Y8ooooo. 88'  `88 88'  `88 88'  `88   88
88    88 88.  ...   88         88 88    88 88.  .88 88.  .88   88
dP    dP `88888P'   dP   `88888P' dP    dP `88888P' `88888P'   dP

Welcome to Netshoot! (github.com/nicolaka/netshoot)
root @ /
 [1]   → netstat -anp| grep 6379

4. 除了--network之外,还可以指定--ipc,--pid,--volume保证内存空间,进程命名空间都一样

5. 如下脚本可以直接在k8s环境中使用

#!/usr/bin/env bash


function kube_inject {
    local ns_name="${1}"
    local pod_name="${2}"
    local container_name_or_idx="${3:-1}"

    local usage="Usage: ${FUNCNAME[0]}   [container_name|container_index]"
    local kube_inject_image="netshoot:latest"
    local docker_daemon_port="2375"
    local docker_tls_flag="--tlsverify \
                           --tlscacert=/etc/kubernetes/ssl/ca.pem \
                           --tlscert=/etc/kubernetes/ssl/docker.pem \
                           --tlskey=/etc/kubernetes/ssl/docker-key.pem"

    if [[ $# -lt 2 ]]; then
        echo "Error: ${FUNCNAME[0]} requires at least 2 arguments for ns_name and pod_name;" \
             "and an optional argument for container's name or index!"
        echo "${usage}"
        return 1
    fi

    // 获取Pod调度的Node节点IP
    hostIP="$(kubectl -n ${ns_name} get pod ${pod_name} -ojsonpath='{.status.hostIP}' 2>/dev/null)"
    if [[ -z "${hostIP}" ]]; then
        echo "Error: no such pod ${pod_name} in the namespace ${ns_name}!"
        echo "${usage}"
        return 2
    fi

    // 判断Pod状态
    status="$(kubectl -n ${ns_name} get pod ${pod_name} -ojsonpath='{.status.phase}')"
    if [[ "${status}" != "Running" ]]; then
        echo "Error: the pod ${pod_name} in the namespace ${ns_name} is not Running!"
        echo "${usage}"
        return 3
    fi

    // 获取Pod对应的ContainerID
    if [[ "${container_name_or_idx}" =~ ^[0-9]+$ ]]; then
        # treat 'container_name_or_idx' as the index of the container in the pod.
        jsonpath="{.status.containerStatuses[$((container_name_or_idx-1))].containerID}"
    else
        # treat 'container_name_or_idx' as the name of the container in the pod.
        jsonpath="{.status.containerStatuses[?(@.name=='${container_name_or_idx}')].containerID}"
    fi

    cid=$(kubectl -n ${ns_name} get pod ${pod_name} -ojsonpath="${jsonpath}" 2>/dev/null)

    if [[ ! ("${cid}" =~ ^docker://[0-9a-f]{64}$) ]]; then
        if [[ "${container_name_or_idx}" =~ ^[0-9]+$ ]]; then
            echo "Error: container index ${container_name_or_idx} is out of bounds!"
        else
            echo "Error: no such container ${container_name_or_idx} in the pod ${pod_name}!"
        fi

        echo "${usage}"
        return 4
    fi

    # remove the prefix 'docker://'.
    cid="${cid:9:64}"
    docker_cmd="docker -H ${hostIP}:${docker_daemon_port} ${docker_tls_flag}"
    // 获取container的根目录
    merged_dir="$(${docker_cmd} inspect -f '{{.GraphDriver.Data.MergedDir}}' ${cid})"

    // 获取container的mount volume信息
    volumes=""
    for bind in $(${docker_cmd} inspect -f '{{.HostConfig.Binds}}' ${cid} |tr -d '\[\]'); do
        # [[ "${bind}" =~ (/etc/hosts|/dev/termination-log) ]] && continue
        # notes: the tailing space ' ' is necessary here.
        volumes+="--volume=$(sed -nr 's|([^:]+):(.*)|\1:/container_root\2|p' <<< ${bind}) "
    done

    ${docker_cmd} run \
                  -it --rm \
                  --ipc=container:${cid} \    // IPC Mode
                  --network=container:${cid} \    // 共享网络命名空间
                  --pid=container:${cid} \    // 共享进程命名空间
                  --volume=${merged_dir}:/container_root:rw \    // MountVolume
                  ${volumes} \
                  ${kube_inject_image} bash -l
}

kube_inject "$@"
[root@k8s-node1 ~]# kube-inject
Error: kube_inject requires at least 2 arguments for ns_name and pod_name; and an optional argument for container's name or index!
Usage: kube_inject   [container_name|container_index]

 

你可能感兴趣的:(Docker,Kubernetes)