概述
CloudProvider 提供kubernetes与云厂商基础服务的对接能力,由 cloud-controller-manager组件实现(lb,云盘,安全组等等)。
aliyun-cloud-provider这个组件主要是aliyun平台的对接插件,可以让用户在创建k8s LoadBalancer 类型的service的时候自动的为用户创建一个阿里云SLB,同时动态的绑定与解绑SLB后端,并且提供了丰富的配置允许用户自定义生成的LoadBalancer.
其他cloud-provider比如aws可提供elb,ebs,安全组等能力,阿里云这里实现了service_controller和route_controller,所以暂时只提供了对接slb资源,其他资源需要单独插件在cloud-provider上层实现,比如cluster-autoscaler需依赖cloud-provider完成node初始化.
所以cloud-provider的能力仅仅对接云厂商基础服务给kubernetes集群使用,我们使用aliyun-cloud-provider 主要也是对接aliyun ess资源,使用cluster-autoscaler动态/定时伸缩kubernetes node节点。
创建aliyun-cloud-provider的正确姿势
前置条件
- kubernetes version > 1.7.2
- CloudNetwork: 仅支持阿里云vpc网络。默认开启,可手动关闭。
--configure-cloud-routes=false
- kube apiserver和kube controller manager不能指定--cloud provider标志。cloud-provider已经单独抽离为running-cloud-provider组件,以后版本此标志将被弃用并删除。
- kubelet必须指定--cloud-provider=external,这是确保kubelet 节点在正常能被调度之前必须由clour-provider-controller初始化操作完成。
集群环境:
- kubernetes version: v1.12.3
- network plugin: flannel vxlan
kubernetes集群配置
kubele 启动参数添加--cloud-provider=external --hostname override=instance id--provider id=instance id参数并重启kubelet。格式为Instance。
KUBELET_CLOUD_PROVIDER_ARGS="--cloud-provider=external --hostname-override=${REGION_ID}.${INSTANCE_ID} --provider-id=${REGION_ID}.${INSTANCE_ID}"
以下命令可找到REGION_ID 和INSTANCE_ID
$ META_EP=http://100.100.100.200/latest/meta-data
$ echo `curl -s $META_EP/region-id`.`curl -s $META_EP/instance-id`
然后重启kubelet kubectl get node 查看hostname是否生效,也可以delete node,重新注册到集群。
部署aliyun-cloud-provider
cloud-provider需要一定的权限才能访问阿里云,需要为ECS实例创建一些RAM策略,或者直接使用accesskeyid&secret,由于我们容器部署cloud-provider所以这里采用AK。
1.配置aliyun-cloud-provider AK(access key) 权限
创建自定义策略,然后将策略绑定到k8s-cloud-provider用户,创建其AK提供给aliyun-cloud-provider使用,保证只能访问我们授权的资源。或者参考地址:https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/examples/master.policy
{
"Version": "1",
"Statement": [
{
"Action": [
"ecs:Describe*",
"ecs:AttachDisk",
"ecs:CreateDisk",
"ecs:CreateSnapshot",
"ecs:CreateRouteEntry",
"ecs:DeleteDisk",
"ecs:DeleteSnapshot",
"ecs:DeleteRouteEntry",
"ecs:DetachDisk",
"ecs:ModifyAutoSnapshotPolicyEx",
"ecs:ModifyDiskAttribute",
"ecs:CreateNetworkInterface",
"ecs:DescribeNetworkInterfaces",
"ecs:AttachNetworkInterface",
"ecs:DetachNetworkInterface",
"ecs:DeleteNetworkInterface",
"ecs:DescribeInstanceAttribute"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"cr:Get*",
"cr:List*",
"cr:PullRepository"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"slb:*"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"cms:*"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"vpc:*"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"log:*"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Action": [
"nas:*"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
创建cloud-config configmap 为cloud-provider提供配置,主要存放AK信息
---
apiVersion: v1
data:
# 这里填写k8s-cloud-provider用户的AK
special.keyid: xxxxxx
special.keysecret: xxxxx
kind: ConfigMap
metadata:
name: cloud-config
namespace: kube-system
- ServiceAccount system:cloud-controller-manager
在启用RBAC的情况下,cloudprovider使用system:cloud controller manager服务帐户授权访问kubernetes集群。所以:
必须创建某些RBAC角色和绑定。有关详细信息,可以参考cloud-controller-manager.yml
需要提供kubeconfig文件给cloud-provider,kubeconfig文件保存到/etc/kubernetes/cloud-controller-manager.conf
替换$CA_DATA 为cat /etc/kubernetes/pki/ca.crt|base64 -w 0输出结果或者kubectl config view --flatten=true可查看CA data ,以及将apiserver替换自己kubernetes apiserver的服务器地址. 我们通过configmap 挂载到cloud-provider容器中.
# RBAC
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:cloud-controller-manager
rules:
- apiGroups:
- ""
resources:
- persistentvolumes
- services
- secrets
- endpoints
- serviceaccounts
verbs:
- get
- list
- watch
- create
- update
- patch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- delete
- patch
- update
- apiGroups:
- ""
resources:
- services/status
verbs:
- update
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- events
- endpoints
verbs:
- create
- patch
- update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:cloud-controller-manager
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: cloud-controller-manager
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:shared-informers
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: shared-informers
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:cloud-node-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: cloud-node-controller
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:pvl-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: pvl-controller
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:route-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: route-controller
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloud-controller-manager
namespace: kube-system
#
---
apiVersion: v1
kind: ConfigMap
metadata:
name: cloud-controller-manager
namespace: kube-system
data:
cloud-controller-manager.conf: |-
kind: Config
contexts:
- context:
cluster: kubernetes
user: system:cloud-controller-manager
name: system:cloud-controller-manager@kubernetes
current-context: system:cloud-controller-manager@kubernetes
users:
- name: system:cloud-controller-manager
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: $CA_DATA
server: https://10.100.254.78:6443
name: kubernetes
- 通过DaemonSet部署aliyun-cloud-provider
完整的yaml如下:
# cloud-controller-manager.yml
---
apiVersion: v1
data:
special.keyid: LTAIDAwowe8wyJKb
special.keysecret: mb3aZvs7wCm3jy5LUeMWPl9waXBQHl
kind: ConfigMap
metadata:
name: cloud-config
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: cloud-controller-manager
namespace: kube-system
data:
cloud-controller-manager.conf: |-
kind: Config
contexts:
- context:
cluster: kubernetes
user: system:cloud-controller-manager
name: system:cloud-controller-manager@kubernetes
current-context: system:cloud-controller-manager@kubernetes
users:
- name: system:cloud-controller-manager
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: ${CA_DATA}
server: https://${APISERVER_ADDRESS}
name: kubernetes
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:cloud-controller-manager
rules:
- apiGroups:
- ""
resources:
- persistentvolumes
- services
- secrets
- endpoints
- serviceaccounts
verbs:
- get
- list
- watch
- create
- update
- patch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- delete
- patch
- update
- apiGroups:
- ""
resources:
- services/status
verbs:
- update
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- events
- endpoints
verbs:
- create
- patch
- update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:cloud-controller-manager
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: cloud-controller-manager
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:shared-informers
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: shared-informers
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:cloud-node-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: cloud-node-controller
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:pvl-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: pvl-controller
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: system:route-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: route-controller
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloud-controller-manager
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: cloud-controller-manager
tier: control-plane
name: cloud-controller-manager
namespace: kube-system
spec:
selector:
matchLabels:
app: cloud-controller-manager
tier: control-plane
template:
metadata:
labels:
app: cloud-controller-manager
tier: control-plane
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccountName: cloud-controller-manager
tolerations:
- effect: NoSchedule
operator: Exists
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: Exists
key: node.cloudprovider.kubernetes.io/uninitialized
nodeSelector:
node-role.kubernetes.io/master: "true"
containers:
- command:
- /cloud-controller-manager
- --kubeconfig=/etc/kubernetes/cloud-controller-manager.conf
- --address=127.0.0.1
- --allow-untagged-cloud=true
- --leader-elect=true
- --cloud-provider=alicloud
- --cluster-name="${cluster_name}"
# network plugin 或者cloudNetwork 可考虑打开
- --allocate-node-cidrs=false
- --cluster-cidr=172.30.0.0/16
# 非cloudNetwork必须禁止,否则会配置阿里云路由表
- --configure-cloud-routes=false
- --use-service-account-credentials=true
- --route-reconciliation-period=30s
- --v=5
image: registry.cn-hangzhou.aliyuncs.com/acs/cloud-controller-manager-amd64:v1.9.3.112-g93c7140-aliyun
env:
- name: ACCESS_KEY_ID
valueFrom:
configMapKeyRef:
name: cloud-config
key: special.keyid
- name: ACCESS_KEY_SECRET
valueFrom:
configMapKeyRef:
name: cloud-config
key: special.keysecret
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10252
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: cloud-controller-manager
resources:
requests:
cpu: 200m
volumeMounts:
- mountPath: /etc/kubernetes/
name: k8s
readOnly: true
- mountPath: /etc/ssl/certs
name: certs
- mountPath: /etc/pki
name: pki
hostNetwork: true
volumes:
#- hostPath:
# path: /etc/kubernetes
- configMap:
name: cloud-controller-manager
name: k8s
- hostPath:
path: /etc/ssl/certs
name: certs
- hostPath:
path: /etc/pki
name: pk
- 启动aliyun-cloud-provider
$kubectl apply -f cloud-controller-manager.yml
cloud-provider 创建SLB测试
一旦云控制器管理器启动并运行,运行一个示例nginx部署:
$ cat <nginx.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-example
spec:
replicas: 1
revisionHistoryLimit: 2
template:
metadata:
labels:
app: nginx-example
spec:
containers:
- image: nginx:latest
name: nginx
ports:
- containerPort: 80
EOF
然后使用以下类型创建服务:LoadBalancer:
$ kubectl expose deployment nginx-example --name=nginx-example --type=LoadBalancer --port=80
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-example LoadBalancer 10.254.13.75 10.100.255.76 80:11435/TCP 25h
其他SLB相关创建annotation参考:https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/zh/usage.md
cloud-provider注意事项:
- --configure-cloud-routes=true,代码逻辑强制刷新路由表,需注意pod cidr是否会冲突,避免被刷新删除影响线上业务。
- 默认逻辑:创建slb逻辑会随机指定可用区,annotation指定可用区范围有限制。
- slb 所在交换机与所在可用区不一致。比如kubernetes-g对应beijing-g区,但是slb所在区却在beijing-a,备份可用区在Beijing-e,不方便管理.
- aliyun-cloud-provider只实现了route_controller和service_controler只支持对接slb资源,有状态存储,cluster-autoscaler分别是单独的provider.
通过cloud-provider动态管理SLB最佳实践:
- 通过annotation指定SLB实例规格类型
- 通过annotation指定SLB backend服务器实例,通过node实例标签过滤
- 指定SLB 主备可用区,建议跟node主备在同个可用区(需提前在控制台测试哪些AZ支持主备)
FAQ:
- node出现node.kubernetes.io/network-unavailable 污点
按照社区github部署完cloud-provider之后,新建的pod全部pending
排查发现每个node 都打上了node.kubernetes.io/network-unavailable的污点,是由于--configure-cloud-routes=true默认true导致要在aliyun创建路由表,由于我们使用的flannel-vxlan所以禁用重启cloud-provider.然后删除每个node的condition或者delete node 重新注册到集群即可 - node出现node.cloudprovider.kubernetes.io/uninitialized污点
cloud-provider组件出现问题没有正常运行,导致node不能被cloud-provider初始化,防止pod被调度到此节点。参见https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/
参考链接:
- https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/
- https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/getting-started.md