kubernetes cloud-provider for aliyun

概述

CloudProvider 提供kubernetes与云厂商基础服务的对接能力,由 cloud-controller-manager组件实现(lb,云盘,安全组等等)。

aliyun-cloud-provider这个组件主要是aliyun平台的对接插件,可以让用户在创建k8s LoadBalancer 类型的service的时候自动的为用户创建一个阿里云SLB,同时动态的绑定与解绑SLB后端,并且提供了丰富的配置允许用户自定义生成的LoadBalancer.

其他cloud-provider比如aws可提供elb,ebs,安全组等能力,阿里云这里实现了service_controller和route_controller,所以暂时只提供了对接slb资源,其他资源需要单独插件在cloud-provider上层实现,比如cluster-autoscaler需依赖cloud-provider完成node初始化.

所以cloud-provider的能力仅仅对接云厂商基础服务给kubernetes集群使用,我们使用aliyun-cloud-provider 主要也是对接aliyun ess资源,使用cluster-autoscaler动态/定时伸缩kubernetes node节点。

创建aliyun-cloud-provider的正确姿势

前置条件

  • kubernetes version > 1.7.2
  • CloudNetwork: 仅支持阿里云vpc网络。默认开启,可手动关闭。
    --configure-cloud-routes=false
    
  • kube apiserver和kube controller manager不能指定--cloud provider标志。cloud-provider已经单独抽离为running-cloud-provider组件,以后版本此标志将被弃用并删除。
  • kubelet必须指定--cloud-provider=external,这是确保kubelet 节点在正常能被调度之前必须由clour-provider-controller初始化操作完成。

集群环境:

  • kubernetes version: v1.12.3
  • network plugin: flannel vxlan

kubernetes集群配置

kubele 启动参数添加--cloud-provider=external --hostname override=instance id--provider id=instance id参数并重启kubelet。格式为Instance。

KUBELET_CLOUD_PROVIDER_ARGS="--cloud-provider=external --hostname-override=${REGION_ID}.${INSTANCE_ID} --provider-id=${REGION_ID}.${INSTANCE_ID}"

以下命令可找到REGION_ID 和INSTANCE_ID

$ META_EP=http://100.100.100.200/latest/meta-data
$ echo `curl -s $META_EP/region-id`.`curl -s $META_EP/instance-id`

然后重启kubelet kubectl get node 查看hostname是否生效,也可以delete node,重新注册到集群。

部署aliyun-cloud-provider

cloud-provider需要一定的权限才能访问阿里云,需要为ECS实例创建一些RAM策略,或者直接使用accesskeyid&secret,由于我们容器部署cloud-provider所以这里采用AK。

1.配置aliyun-cloud-provider AK(access key) 权限
创建自定义策略,然后将策略绑定到k8s-cloud-provider用户,创建其AK提供给aliyun-cloud-provider使用,保证只能访问我们授权的资源。或者参考地址:https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/examples/master.policy

{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "ecs:Describe*",
        "ecs:AttachDisk",
        "ecs:CreateDisk",
        "ecs:CreateSnapshot",
        "ecs:CreateRouteEntry",
        "ecs:DeleteDisk",
        "ecs:DeleteSnapshot",
        "ecs:DeleteRouteEntry",
        "ecs:DetachDisk",
        "ecs:ModifyAutoSnapshotPolicyEx",
        "ecs:ModifyDiskAttribute",
        "ecs:CreateNetworkInterface",
        "ecs:DescribeNetworkInterfaces",
        "ecs:AttachNetworkInterface",
        "ecs:DetachNetworkInterface",
        "ecs:DeleteNetworkInterface",
        "ecs:DescribeInstanceAttribute"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "cr:Get*",
        "cr:List*",
        "cr:PullRepository"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "slb:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "cms:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "vpc:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "log:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Action": [
        "nas:*"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    }
  ]
}

创建cloud-config configmap 为cloud-provider提供配置,主要存放AK信息

---
apiVersion: v1
data:
  # 这里填写k8s-cloud-provider用户的AK
  special.keyid: xxxxxx
  special.keysecret: xxxxx
kind: ConfigMap
metadata:
  name: cloud-config
  namespace: kube-system
  1. ServiceAccount system:cloud-controller-manager
    在启用RBAC的情况下,cloudprovider使用system:cloud controller manager服务帐户授权访问kubernetes集群。所以:
  • 必须创建某些RBAC角色和绑定。有关详细信息,可以参考cloud-controller-manager.yml

  • 需要提供kubeconfig文件给cloud-provider,kubeconfig文件保存到/etc/kubernetes/cloud-controller-manager.conf

替换$CA_DATA 为cat /etc/kubernetes/pki/ca.crt|base64 -w 0输出结果或者kubectl config view --flatten=true可查看CA data ,以及将apiserver替换自己kubernetes apiserver的服务器地址. 我们通过configmap 挂载到cloud-provider容器中.

# RBAC
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
rules:
  - apiGroups:
      - ""
    resources:
      - persistentvolumes
      - services
      - secrets
      - endpoints
      - serviceaccounts
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
      - delete
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - services/status
    verbs:
      - update
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - events
      - endpoints
    verbs:
      - create
      - patch
      - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-controller-manager
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:shared-informers
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: shared-informers
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-node-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-node-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:pvl-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: pvl-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:route-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: route-controller
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloud-controller-manager
  namespace: kube-system
 
 
#
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cloud-controller-manager
  namespace: kube-system
data:
  cloud-controller-manager.conf: |-
    kind: Config
    contexts:
    - context:
        cluster: kubernetes
        user: system:cloud-controller-manager
      name: system:cloud-controller-manager@kubernetes
    current-context: system:cloud-controller-manager@kubernetes
    users:
    - name: system:cloud-controller-manager
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: $CA_DATA
        server: https://10.100.254.78:6443
      name: kubernetes
  1. 通过DaemonSet部署aliyun-cloud-provider
    完整的yaml如下:
# cloud-controller-manager.yml
---
apiVersion: v1
data:
  special.keyid: LTAIDAwowe8wyJKb
  special.keysecret: mb3aZvs7wCm3jy5LUeMWPl9waXBQHl
kind: ConfigMap
metadata:
  name: cloud-config
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cloud-controller-manager
  namespace: kube-system
data:
  cloud-controller-manager.conf: |-
    kind: Config
    contexts:
    - context:
        cluster: kubernetes
        user: system:cloud-controller-manager
      name: system:cloud-controller-manager@kubernetes
    current-context: system:cloud-controller-manager@kubernetes
    users:
    - name: system:cloud-controller-manager
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: ${CA_DATA}
        server: https://${APISERVER_ADDRESS}
      name: kubernetes
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
rules:
  - apiGroups:
      - ""
    resources:
      - persistentvolumes
      - services
      - secrets
      - endpoints
      - serviceaccounts
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
      - delete
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - services/status
    verbs:
      - update
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - events
      - endpoints
    verbs:
      - create
      - patch
      - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-controller-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-controller-manager
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:shared-informers
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: shared-informers
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:cloud-node-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: cloud-node-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:pvl-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: pvl-controller
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: system:route-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
  name: route-controller
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloud-controller-manager
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: cloud-controller-manager
    tier: control-plane
  name: cloud-controller-manager
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cloud-controller-manager
      tier: control-plane
  template:
    metadata:
      labels:
        app: cloud-controller-manager
        tier: control-plane
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccountName: cloud-controller-manager
      tolerations:
      - effect: NoSchedule
        operator: Exists
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        operator: Exists
        key: node.cloudprovider.kubernetes.io/uninitialized
      nodeSelector:
         node-role.kubernetes.io/master: "true"
      containers:
      - command:
        -  /cloud-controller-manager
        - --kubeconfig=/etc/kubernetes/cloud-controller-manager.conf
        - --address=127.0.0.1
        - --allow-untagged-cloud=true
        - --leader-elect=true
        - --cloud-provider=alicloud
        - --cluster-name="${cluster_name}"
        # network plugin 或者cloudNetwork 可考虑打开
        - --allocate-node-cidrs=false
        - --cluster-cidr=172.30.0.0/16
        # 非cloudNetwork必须禁止,否则会配置阿里云路由表
        - --configure-cloud-routes=false
        - --use-service-account-credentials=true
        - --route-reconciliation-period=30s
        - --v=5
        image: registry.cn-hangzhou.aliyuncs.com/acs/cloud-controller-manager-amd64:v1.9.3.112-g93c7140-aliyun
        env:
        - name: ACCESS_KEY_ID
          valueFrom:
            configMapKeyRef:
              name: cloud-config
              key: special.keyid
        - name: ACCESS_KEY_SECRET
          valueFrom:
            configMapKeyRef:
              name: cloud-config
              key: special.keysecret
        livenessProbe:
          failureThreshold: 8
          httpGet:
            host: 127.0.0.1
            path: /healthz
            port: 10252
            scheme: HTTP
          initialDelaySeconds: 15
          timeoutSeconds: 15
        name: cloud-controller-manager
        resources:
          requests:
            cpu: 200m
        volumeMounts:
        - mountPath: /etc/kubernetes/
          name: k8s
          readOnly: true
        - mountPath: /etc/ssl/certs
          name: certs
        - mountPath: /etc/pki
          name: pki
      hostNetwork: true
      volumes:
      #- hostPath:
      #    path: /etc/kubernetes
      - configMap:
           name: cloud-controller-manager
        name: k8s
      - hostPath:
          path: /etc/ssl/certs
        name: certs
      - hostPath:
          path: /etc/pki
        name: pk
  1. 启动aliyun-cloud-provider
$kubectl apply -f cloud-controller-manager.yml

cloud-provider 创建SLB测试

一旦云控制器管理器启动并运行,运行一个示例nginx部署:

$ cat <nginx.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-example
spec:
  replicas: 1
  revisionHistoryLimit: 2
  template:
    metadata:
      labels:
        app: nginx-example
    spec:
      containers:
      - image: nginx:latest
        name: nginx
        ports:
          - containerPort: 80
EOF

然后使用以下类型创建服务:LoadBalancer:

$ kubectl expose deployment nginx-example --name=nginx-example --type=LoadBalancer --port=80
$ kubectl get svc
NAME            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
nginx-example   LoadBalancer   10.254.13.75   10.100.255.76   80:11435/TCP   25h

其他SLB相关创建annotation参考:https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/zh/usage.md

cloud-provider注意事项:

  • --configure-cloud-routes=true,代码逻辑强制刷新路由表,需注意pod cidr是否会冲突,避免被刷新删除影响线上业务。
  • 默认逻辑:创建slb逻辑会随机指定可用区,annotation指定可用区范围有限制。
  • slb 所在交换机与所在可用区不一致。比如kubernetes-g对应beijing-g区,但是slb所在区却在beijing-a,备份可用区在Beijing-e,不方便管理.
  • aliyun-cloud-provider只实现了route_controller和service_controler只支持对接slb资源,有状态存储,cluster-autoscaler分别是单独的provider.

通过cloud-provider动态管理SLB最佳实践:

  1. 通过annotation指定SLB实例规格类型
  2. 通过annotation指定SLB backend服务器实例,通过node实例标签过滤
  3. 指定SLB 主备可用区,建议跟node主备在同个可用区(需提前在控制台测试哪些AZ支持主备)

FAQ:

  1. node出现node.kubernetes.io/network-unavailable 污点
    按照社区github部署完cloud-provider之后,新建的pod全部pending
    排查发现每个node 都打上了node.kubernetes.io/network-unavailable的污点,是由于--configure-cloud-routes=true默认true导致要在aliyun创建路由表,由于我们使用的flannel-vxlan所以禁用重启cloud-provider.然后删除每个node的condition或者delete node 重新注册到集群即可
  2. node出现node.cloudprovider.kubernetes.io/uninitialized污点
    cloud-provider组件出现问题没有正常运行,导致node不能被cloud-provider初始化,防止pod被调度到此节点。参见https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/

参考链接:

  • https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/
  • https://github.com/kubernetes/cloud-provider-alibaba-cloud/blob/master/docs/getting-started.md

你可能感兴趣的:(kubernetes cloud-provider for aliyun)