1.说明
总体的设计思想主要来源于蚂蚁金服的文章深度 | 蚂蚁金服自动化运维大规模 Kubernetes 集群的实践之路
实现中开发的Operator主要基于operator-sdk开发。
目的,通过元数据集群能够对业务集群进行快速的上下线,管理业务集群的生命周期。在公有云上为租户快速的创建回收集群等。
2.总体的架构
3.主要的流程
-
元数据集群搭建(平台部署操作)
作为整个容器云的支撑系统,放在支撑区,搭建过程kubeadm安装kubernetes和手动安装kubernetes注册Master-Operator和Node-Operator到元数据集群。Master包含Etcd,Apiserver,scheduler,controller-manager;Node包含Kubelet和docker(可能是其他容器)
同时将节点的证书写入到ConfigMap中,目的是Node-Operator在加入节点时候,从集群中拿到证书,分发给被加入的节点,当然也可以放到配置中心。实际上ca的证书默认已经放到集群中,比如在configmap[extension-apiserver-authentication]中就有。
kubectl -n kube-system create configmap ca --from-file/etc/kubernetes/ca.pem kubectl -n kube-system create configmap bootstrap --from-file=/etc/kubernetes/bootstrap.conf
主机申请或配置(租户操作)
如果有底层IAAS云管,可以通过IAAS API根据配置创建主机,没有云管你可以直接配置主机证书生成
根据提供的集群信息(主机,POD_CIDR,SERVICE_CIDR等),生成集群的证书,并写入到配置中心Nacos或者元数据集群的ConfigMap中,当然有的证书大小开会超过1M,etcd中建议不要超过1M,但证书很少不会有太大影响。此处的证书生成工具,博主通过serverless的方式作为函数注册到元数据集群中了,基于kubeless开发,实际看需要,详情参考serverless开发。-
通过元数据集群Node-Operator将控制节点加入到元数据集群作为元数据集群的node
在将节点加入的元数据集群的时候,需要给kubelet增加默认的标签,用于在部署的时候济进行标签选择,配置如下--node-labels=kubelet.kubernetes.io/tenant=租户ID kubelet.kubernetes.io/cluster=集群名称 \
另外在搭建元数据集群时,将kubelet需要的证书放到ConfigMap中,此时在分发(Node-Operator内部主要通过Ansible实现)给kubelet
bootstrap := &corev1.ConfigMap{} err = r.client.Get(context.TODO(), types.NamespacedName{Name: "bootstrap", Namespace: "kube-system"}, bootstrap) if err != nil && errors.IsNotFound(err) { ioutil.WriteFile("/etc/kubernetes/bootstrap.conf",[]byte(bootstrap.Data["bootstrap.conf"]),os.ModePerm) } ca:= &corev1.ConfigMap{} err = r.client.Get(context.TODO(), types.NamespacedName{Name: "ca", Namespace: "kube-system"}, ca) if err != nil && errors.IsNotFound(err) { ioutil.WriteFile("/etc/kubernetes/ca.pem",[]byte(clientCA.Data["ca.pem"]),os.ModePerm) }
部署其他容器组件,比如kube-proxy,监控组件,采集组件等
-
通过Master-Operator在特定标签(租户和集群名称)的节点上部署Master
Etcd:通过Deployment实现,增加标签选择,使用HostNetwork方式。同时在InitContainers中增加一个容器,用于根据租户和集群名称等信息从元数据集群(或者配置中心)中获取生成的集群证书,包括业务集群节点的证书bootstrap.conf和ce.pem,同时挂载到主机目录/etc/kubernetes/pki下
func initContainers() []corev1.Container{ return []corev1.Container{ { Name: "certs", Image: "cloud.org/cert:latest", ImagePullPolicy: corev1.PullIfNotPresent, VolumeMounts:[]corev1.VolumeMount{ { Name: "cert", MountPath: "/etc/kubernetes/pki", }, }, }, } }
后续为了容灾可以将Etcd通过operator实现,coreos已提供。
Apiserver:通过Deployment实现,增加标签选择,使用HostNetwork方式
scheduler:通过Deployment实现,增加标签选择,使用HostNetwork方式
controller-manager:通过Deployment实现,增加标签选择,使用HostNetwork方式
启动完成之后,还需要执行如下动作:
①将业务集群中节点需要的证书,写入到业务集群的configmap中
②给业务集群注册Node-Operator
- 通过业务集群的Node-Operator将节点加入到业务集群
4.主要实现过程
4.1.开发环境
- 环境安装( go git)
export GOROOT=/data/go
export GOPATH=/data/work/go
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
#开启mod支持
export GO111MODULE=on
#方便包下载
export GOPROXY=https://goproxy.cn
- operator-sdk安装
#/data/work/go/src
[root@node1 src]# wget https://github.com/operator-framework/operator-sdk/releases/download/v0.11.0/operator-sdk-v0.11.0-x86_64-linux-gnu
[root@node1 src]# chmod +x operator-sdk-v0.11.0-x86_64-linux-gnu
[root@node1 src]# mv operator-sdk-v0.11.0-x86_64-linux-gnu operator-sdk
[root@node1 src]# mv operator-sdk /usr/local/bin
4.2.Master-Operator
[root@node operator]# operator-sdk new master --repo cloud.org/operator/master
INFO[0000] Creating new Go operator 'master'.
INFO[0000] Created go.mod
#此处省略......
INFO[0014] Project validation successful.
INFO[0014] Project creation complete.
[root@node master]# operator-sdk add api --api-version=crd.cloud.org/v1alpha1 --kind=KubeMaster
INFO[0000] Generating api version crd.cloud.org/v1alpha1 for kind KubeMaster.
INFO[0000] Created pkg/apis/crd/group.go
INFO[0012] Created pkg/apis/crd/v1alpha1/kubemaster_types.go
INFO[0012] Created pkg/apis/addtoscheme_crd_v1alpha1.go
INFO[0012] Created pkg/apis/crd/v1alpha1/register.go
INFO[0012] Created pkg/apis/crd/v1alpha1/doc.go
#此处省略......
deploy/crds/crd.cloud.org_kubemasters_crd.yaml
INFO[0026] Code-generation complete.
INFO[0026] API generation complete.
[root@node master]# operator-sdk add controller --api-version=crd.cloud.org/v1alpha1 --kind=KubeMaster
INFO[0000] Generating controller version crd.cloud.org/v1alpha1 for kind KubeMaster.
INFO[0000] Created pkg/controller/kubemaster/kubemaster_controller.go
INFO[0000] Created pkg/controller/add_kubemaster.go
INFO[0000] Controller generation complete.
创建deployment
// newPodForCR returns a busybox pod with the same name/namespace as the cr
func newDeploymentForCR(cr *cloudv1alpha1.MasterOperator) *appv1.Deployment {
labels := map[string]string{
"app": cr.Name,
}
nodeSelector := map[string]string{
"kubelet.kubernetes.io/tenant":cr.Name,//TODO:租户+集群名称
}
buffer := bytes.NewBufferString("")
etcdSize := len(cr.Spec.Etcds)
for index,node := range cr.Spec.Etcds {
buffer.WriteString("https://"+node.IP+":2379")
if etcdSize-1 != index {
buffer.WriteString(",")
}
}
matchLabels := map[string]string{
"app" : cr.Name + "-master",
}
if cr.Selector == nil {
cr.Selector = &metav1.LabelSelector{
MatchLabels: matchLabels,
}
}else if cr.Selector.MatchLabels == nil{
if cr.Selector.MatchLabels == nil{
cr.Selector.MatchLabels = matchLabels
}
}else {
cr.Selector.MatchLabels["app"] = cr.Name + "-master"
}
//TODO:将ETCD,Apiserver,Controller-Manager,Scheduler按POD维度部署,生产中建议按照Deployment维度
size := int32(len(cr.Spec.Masters))
return &appv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: cr.Name + "-master",
Namespace: cr.Namespace,
Labels: labels,
},
Spec: appv1.DeploymentSpec{
Selector: cr.Selector,
Replicas: &size,
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: cr.Selector.MatchLabels,
Name: cr.Name + "-master",
},
Spec: corev1.PodSpec{
NodeSelector: nodeSelector,
HostNetwork: true,
DNSPolicy: corev1.DNSClusterFirstWithHostNet,
//Priority: 2000000000,
PriorityClassName: "system-cluster-critical",
//负责下载证书
InitContainers: initContainers(),
Containers: assemblyContainers(buffer.String(),cr.Spec.ServiceCIDR,cr.Spec.PodCIDR),
Volumes: []corev1.Volume{
certVolume(),
localtimeVolume("localtime","/etc/localtime","file"),
},
},
},
},
}
}
4.3.Node-Operator
[root@node operator]# operator-sdk new node --repo cloud.org/operator/node
#此处省略......
INFO[0006] Project validation successful.
INFO[0006] Project creation complete.
[root@node node]# operator-sdk add api --api-version=crd.cloud.org/v1alpha1 --kind=KubeNode
#此处省略......
INFO[0011] Code-generation complete.
INFO[0011] API generation complete.
[root@node node]# operator-sdk add controller --api-version=crd.cloud.org/v1alpha1 --kind=KubeNode
INFO[0000] Generating controller version crd.cloud.org/v1alpha1 for kind KubeNode.
INFO[0000] Created pkg/controller/kubenode/kubenode_controller.go
INFO[0000] Created pkg/controller/add_kubenode.go
INFO[0000] Controller generation complete.
初始化Ansible配置文件
//1.初始化ansible hosts文件
err = initAnsibleHosts(instance.Spec.Nodes)
if err !=nil {
fmt.Println("初始化Ansible 主机异常")
return reconcile.Result{}, err
}
拉取节点所需证书
bootstrap := &corev1.ConfigMap{}
err = r.client.Get(context.TODO(), types.NamespacedName{Name: "bootstrap", Namespace: "kube-system"}, bootstrap)
if err != nil && errors.IsNotFound(err) {
ioutil.WriteFile("/etc/kubernetes/bootstrap.conf",[]byte(bootstrap.Data["bootstrap.conf"]),os.ModePerm)
}
clientCA := &corev1.ConfigMap{}
err = r.client.Get(context.TODO(), types.NamespacedName{Name: "extension-apiserver-authentication", Namespace: "kube-system"}, clientCA)
if err != nil && errors.IsNotFound(err) {
ioutil.WriteFile("/etc/kubernetes/ca.pem",[]byte(clientCA.Data["client-ca-file"]),os.ModePerm)
}
Ansible执行的yaml
- hosts: kubernetes
remote_user: root
tasks:
- name: "1.创建工作目录"
file:
path: "{{ item.path }}"
state: "{{ item.state }}"
with_items:
- { path: "{{WORK_PATH}}/", state: "directory" }
- { path: "/etc/kubernetes/pki", state: "directory" }
- { path: "/etc/kubernetes/manifests", state: "directory" }
- name: "2.拷贝安装文件,后续可以放到下载中心,TODO:证书"
copy:
src: "{{ item.src }}"
dest: "{{ item.dest }}"
with_items:
- { src: "/assets/systemd/docker.service", dest: "/usr/lib/systemd/system/" }
- { src: "/assets/systemd/kubelet.service", dest: "/usr/lib/systemd/system/" }
- { src: "/assets/pki/ca.pem", dest: "/etc/kubernetes/pki" }
- { src: "/assets/pki/bootstrap.conf", dest: "/etc/kubernetes/" }
- { src: "/assets/script/kubelet.sh", dest: "/data/" }
- name: "3.安装Node节点"
shell: sh /data/kubelet.sh {{WORK_PATH}} {{TENANT}}
- name: "4.清理安装文件"
shell: rm -rf /data/kubelet.sh
执行的脚本
#!/bin/bash
# ----------------------------------------------------------------------
# name: kubelet.sh
# version: 1.0
# createTime: 2019-06-25
# description: 初始化
# author: doublegao
# email: [email protected]
# params: 工作目录,tenant,dockerVersion,kubeletVersion
# example: kubelet.sh path tenant 18.09.9 v1.15.3
# ----------------------------------------------------------------------
WORK_PATH=$1
#去掉最后一个斜杠 "/"
WORK_PATH=${WORK_PATH%*/}
DOCKER_PATH=$WORK_PATH/docker
KUBELET_PATH=$WORK_PATH/kubelet
mkdir -p $DOCKER_PATH
mkdir -p $KUBELET_PATH
DOCKER_VERSION=18.09.9
KUBELET_VERSION=v1.15.3
echo "############# 1.系统初始化"
setenforce 0
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
systemctl disable firewalld
systemctl stop firewalld
swapoff -a
sysctl -p
sed -i 's/.*swap.*/#&/' /etc/fstab
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
cat > /etc/sysctl.d/k8s.conf < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
echo "############# 3.检查yum可用性"
yum repolistj
echo "############# 4.安装启用kubelet 和 docker"
yum install -y docker-ce-$DOCKER_VERSION-3.el7.x86_64 kubelet-$KUBELET_VERSION-0.x86_64
systemctl enable docker.service && systemctl enable kubelet.service
echo "############# 5.修改配置文件"
sed -i "s#_DATA_ROOT_#$DOCKER_PATH#g" /usr/lib/systemd/system/docker.service
sed -i "s#_HOSTNAME_#$HOSTNAME#g" /usr/lib/systemd/system/kubelet.service
sed -i "s#_KUBELET_PATH_#$KUBELET_PATH#g" /usr/lib/systemd/system/kubelet.service
sed -i "s#_TENANT_#$2#g" /usr/lib/systemd/system/kubelet.service
echo "############# 6.启动docker,拉去pause镜像"
systemctl daemon-reload && systemctl start docker.service
docker pull k8s.gcr.io/pause-amd64:3.1
echo "############# 7.启动kubelet"
systemctl daemon-reload && systemctl start kubelet.service