基于kubemark的性能测试集群搭建指南

目标版本: kubernetes 1.17.5

集群系统: centos 7

注: 不同版本的kubernetes性能测试有差异,该指南不保证在其他版本有效


    前言: kubemark是一个节点模拟工具,目的是用于测试大规模集群下kubernetes接口时延和调度的性能。

我们需要搭建A,B两个集群,B是我们需要测试的集群,A是跑kubemark负载的集群,
kubemark容器在A中以deployment的形式部署,每个副本都会以节点的形式注册到B中,
部署完成后在A中有多少个kubemark的副本,B集群就有多少个node节点

现假设已有两个可用集群A,B

  1. 编译并打包kubemark镜像

  2. 下载kurbernetes源码,版本需要和B集群一致

  3. 编译kubemark二进制文件

./hack/build-go.sh cmd/kubemark/

cp $GOPATH/src/k8s.io/kubernetes/_output/bin/kubemark $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/

  1. 构建kubemark镜像

cd $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/

make build

  1. 创建namespace、configmap、secret、rbac

kubectl create ns kubemark

kubectl create cm node-configmap --from-literal=content.type="" --from-file=kernel.monitor="kernel-monitor.json" -n kubemark (./test/kubemark/resources/kernel-monitor.json)

kubectl create secret generic kubeconfig --type=Opaque --from-file=kubelet.kubeconfig=kubemark.kubeconfig --from-file=kubeproxy.kubeconfig=kubemark.kubeconfig --from-file=npd.kubeconfig=kubemark.kubeconfig --from-file=heapster.kubeconfig=kubemark.kubeconfig --from-file=cluster_autoscaler.kubeconfig=kubemark.kubeconfig --from-file=dns.kubeconfig=kubemark.kubeconfig(kubermark.kubeconfig是B集群的kubeconfig /root/.kube/config)

kubectl apply -f addons/ -n kubemark(./test/kubemark/resources/manifests/addons)

  1. 创建kubemark负载

kubectl apply -f hollow-node_template.yaml -n kubemark

参考配置:

apiVersion: v1

kind: ReplicationController

metadata:

name: hollow-node

labels:

name: hollow-node

spec:

replicas: 200

selector:

name: hollow-node

template:

metadata:

labels:

name: hollow-node

spec:

initContainers:

- name: init-inotify-limit

image: busybox

command: ['sysctl', '-w', 'fs.inotify.max_user_instances=1000']

securityContext:

privileged: true

volumes:

- name: kubeconfig-volume

secret:

secretName: kubeconfig

- name: kernelmonitorconfig-volume

configMap:

name: node-configmap

- name: logs-volume

hostPath:

path: /var/log

- name: no-serviceaccount-access-to-real-master

emptyDir: {}

containers:

- name: hollow-kubelet

image: test.cargo.io/release/kubemark:latest

ports:

- containerPort: 4194

- containerPort: 10250

- containerPort: 10255

env:

- name: CONTENT_TYPE

valueFrom:

configMapKeyRef:

name: node-configmap

key: content.type

- name: NODE_NAME

valueFrom:

fieldRef:

fieldPath: metadata.name

command:

- /bin/sh

- -c

- /kubemark --morph=kubelet --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubelet.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubelet-$(NODE_NAME).log 2>&1

volumeMounts:

- name: kubeconfig-volume

mountPath: /kubeconfig

readOnly: true

- name: logs-volume

mountPath: /var/log

resources:

requests:

cpu: 20m

memory: 50M

securityContext:

privileged: true

- name: hollow-proxy

image: test.cargo.io/release/kubemark:latest

env:

- name: CONTENT_TYPE

valueFrom:

configMapKeyRef:

name: node-configmap

key: content.type

- name: NODE_NAME

valueFrom:

fieldRef:

fieldPath: metadata.name

command:

- /bin/sh

- -c

- /kubemark --morph=proxy --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubeproxy.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubeproxy-$(NODE_NAME).log 2>&1

volumeMounts:

- name: kubeconfig-volume

mountPath: /kubeconfig

readOnly: true

- name: logs-volume

mountPath: /var/log

resources:

requests:

cpu: 20m

memory: 50M

- name: hollow-node-problem-detector

image: test.cargo.io/release/node-problem-detector:v0.8.0

env:

- name: NODE_NAME

valueFrom:

fieldRef:

fieldPath: metadata.name

command:

- /bin/sh

- -c

- /node-problem-detector --system-log-monitors=/config/kernel.monitor --apiserver-override="https://192.168.0.16:6443?inClusterConfig=false&auth=/kubeconfig/npd.kubeconfig" --alsologtostderr 1>>/var/log/npd-$(NODE_NAME).log 2>&1

volumeMounts:

- name: kubeconfig-volume

mountPath: /kubeconfig

readOnly: true

- name: kernelmonitorconfig-volume

mountPath: /config

readOnly: true

- name: no-serviceaccount-access-to-real-master

mountPath: /var/run/secrets/kubernetes.io/serviceaccount

readOnly: true

- name: logs-volume

mountPath: /var/log

resources:

requests:

cpu: 20m

memory: 50M

securityContext:

privileged: true

  1. 运行e2e性能用例

这里有一个大坑,网上的搜到的文章大多数都是编译e2e的二进制文件直接运行

make WHAT="test/e2e/e2e.test"

./e2e.test --kube-master=192.168.0.16 --host=https://192.168.0.16:6443 --ginkgo.focus="\[Performance\]" --provider=local --kubeconfig=kubemark.kubeconfig --num-nodes=10 --v=3 --ginkgo.failFast --e2e-output-dir=. --report-dir=.

但其实e2e的性能用例已经被移出主库了 https://github.com/kubernetes/kubernetes/pull/83322,所以在2019.10.1之后出的版本用上面的命令是无法运行性能测试的

  1. 利用perf-tests库进行性能测试(切换到B集群的master节点运行)

  2. 下载对应kubernetes版本的perf-tests源码 https://github.com/kubernetes/perf-tests

  3. 运行测试命令(节点数需要>=100,否则需要改动job.yaml里面的参数)

./run-e2e.sh --testconfig=job.yaml --kubeconfig=config.yaml --provider=local --masterip=192.168.0.16,192.168.0.23,192.168.0.40 --mastername=kube-master-1,kube-master-2,kube-master-3 --master-internal-ip=192.168.0.16,192.168.0.23,192.168.0.40 --enable-prometheus-server --tear-down-prometheus-server=false

问题调试:

  • 如果targets的端口需要修改,直接编辑service并修改相应的endpoints

  • 收集etcd-metrics的时候会报错: etcdmetrics: failed to collect etcd database size,是由于脚本直接通过2379无法拉取数据,需要修改源码:

    perf-tests/clusterloader2/pkg/measurement/common/etcd_metrics.go "curl http://localhost:2379/metrics" -> "curl -L https://localhost:2379/metrics --key /etc/kubernetes/etcd/etcd.key --cert /etc/kubernetes/etcd/etcd.crt --insecure""

  • prometheus的target列表中有几个接口可能没法直接收集数据,会报401的权限问题,解决办法如下

    编辑对应的servicemonitor,在endpoints字段里加上bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

spec:

endpoints:

- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

interval: 5s

port: apiserver

scheme: https

tlsConfig:

insecureSkipVerify: true

  kubelet加上--authentication-token-webhook=true,--authorization-mode=Webhook并重启
  把system:kubelet-api-admin的ClusterRole绑定到prometheus-k8s的ServiceAccount上面

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRoleBinding

metadata:

name: prometheus-k8s-1

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: ClusterRole

name: system:kubelet-api-admin

subjects:

- kind: ServiceAccount

name: prometheus-k8s

namespace: monitoring

你可能感兴趣的:(基于kubemark的性能测试集群搭建指南)