kube-apiserver 是k8s 最重要的控制组件之一,主要提供以下功能:
API Server 请求流程概览
k8s API 的每个请求都需要经过多阶段的访问控制后才会被接受,包括认证、授权以及准入控制等:
API Server 访问控制细节
访问控制的细节还是比较复杂的:
decoding: 需要将 json 请求体反序列化为 Go 结构体,k8s 中的每一个资源对象都是一个结构体。
request conversion:将请求对象转换为内部对象进行后续处理,k8s 中的对象分为外部对象(external version)和内部对象(internal version)。
通过 decoding 和 conversion 就将请求json 转换为了 Pod 资源对象进行处理,当处理完毕后在通过encoding 和 conversion 转换为外部对象返回。
admission: 准入控制,会先将请求 Schema 对象进行变形(mutating) , 变成符合后续逻辑处理的结构体对象,然后进行合法校验。
其中提供了mutating 和 validating 的 webhook 用户可以实现这些 webhook 进行自定义准入控制。
REST logic : 对资源对象具体的操作逻辑,有些操作动作有预处理逻辑,比如说更新或者创建 Pod。
storage conversion: 存储转换,转换为 etcd 的存储格式存储。
k8s的请求有两种模式:
非安全模式(insecure-port)
安全模式(secure-port)
x509 证书
静态 Token 文件
引导 Token
静态密码文件
ServiceAccount
sa(ServiceAccount) 是 k8s 比较常见的一种认证方式,每一个命名空间在创建的时候都会有一个默认的 sa, 每个 sa 都包含对应的 token。
sa 是 k8s 自动生成的,并会自动挂载到容器的 /run/secret/kubenetes.io/serviceaccount 目录中。
OpenID
Webhook 令牌身份认证
匿名请求
下边所有操作都是在** root 用户**下操作。
在master 节点上创建存放静态 token 文件的目录
mkdir -p /etc/kubernetes/auth
vim /etc/kubernetes/auth/static-token
admin-token,admin,1005
备份原有 apiserver 配置文件
cp /etc/kubernetes/manifests/kube-apiserver.yaml ~/kube-apiserver.yaml.$(date +%F)
修改 apiserver 配置文件将静态 token 路径配置进去 vim /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.146.189:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
......
# 配置静态token
- --token-auth-file=/etc/kubernetes/auth/static-token
image: registry.k8s.io/kube-apiserver:v1.25.4
imagePullPolicy: IfNotPresent
......
# 将静态 token 文件挂载进容器
- mountPath: /etc/kubernetes/auth
name: auth-files
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
......
# 将静态 token 文件挂载进容器
- hostPath:
path: /etc/kubernetes/auth
type: DirectoryOrCreate
name: auth-files
status: {}
发送curl 请求验证静态 token 认证是否成功
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ curl https://192.168.146.189:6443/api/v1/namespaces/default -H "Authorization: Bearer admin-token" -k
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "namespaces \"default\" is forbidden: User \"admin\" cannot get resource \"namespaces\" in API group \"\" in the namespace \"default\"",
"reason": "Forbidden",
"details": {
"name": "default",
"kind": "namespaces"
},
"code": 403
发现返回的是403 权限校验失败,说明认证是已经通过了的。
创建私钥
openssl genrsa -out myuser.key 2048
openssl req -new -key myuser.key -out myuser.csr
进行base64编码
cat myuser.csr | base64 | tr -d "\n"
创建k8s csr
cat <| kubectl apply -f -
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: myuser
spec:
request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ3ZqQ0NBYVlDQVFBd2VURUxNQWtHQTFVRUJoTUNRMDR4RGpBTUJnTlZCQWdNQlhkMWFHRnVNUTR3REFZRApWUVFIREFWM2RXaGhiakVQTUEwR0ExVUVDZ3dHZW1odmRYcDVNUTh3RFFZRFZRUUxEQVo2YUc5MWVua3hEekFOCkJnTlZCQU1NQm5wb2IzVjZlVEVYTUJVR0NTcUdTSWIzRFFFSkFSWUlZVUJ4Y1M1amIyMHdnZ0VpTUEwR0NTcUcKU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLQW9JQkFRRGVmOFZpL3d0MGY1TmczejRyOEdMK0x5bHNnMGhhQnVoMAphTjhFMkRNcVhURGZSc3hweVRoc25Iak5CeW9zZUhtRDEybnI0WHVXTlJUWjVCYjQ1czNidnlSMWV6ZnJ5VzgyCklRTlRxWXJKczhoNEhuQnVmSnRHSTIwbkJSR0Y5eDI0a2JWL0lCMm9lano5WFp3bE9wbXpKVjkxUjZrQnZ1ZjYKRWliL3FCVk5SUzFwUGVzYi90bW0zekZnYkY4NUR4ZkFGOU9SdWgveWdSNHlNNHR1a01pekZLMHBEVzA4UDFYeAprVmpSUG1iTnY2K3ptcE1IZmJxNzBvKzlwRlhVcVd3dSsxUG5DS0E3M2F2RExEQkdkVHl4TXMyYlpaanU3N1gzClI2Q21HcytQWkdSKzZtTkVFbllBYmJBbGNLdHJLYzJPWDRtUElEazhHNlpVQnFRS2lmVGhBZ01CQUFHZ0FEQU4KQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBVEZjSkN5Mm95TmtMUTZUUjhHbVk1UXJwUDZEY01IRHNEL2NDTk9kNQpuNVZJSzlBSWVtVlgzWVZsTENGeTdwSHRoaXd0Vm9EUHJ2enJhMnpSU296bDF1UTU1dXU2U3R0K0VRdU9OSzFVCm5SRzh2Rysydi9ZM3ZlQVBWZzU1WTRjMUNKYXZHWTAxSjlTbGFhN3pwVGpmN3hjc1NFRUpTYzFkTFpzNHBoTkIKQUtBT0IxYVRrdWZVaHBCdk9rT2plRGE3K0JxdktVZmdOR1NEaHNuNXF3OXhYT0ZzT1JNaHpBRUVUaGg1QTFKWgpKVVNuMWo4YVdZRHFnODREY3hQWTdvbjB5cUpnR0FFNzl2UmtrNmV3Qk9ITi9WdUhqdVdPREVsWlRvaGkvZVhYCkI1WEo2SkJlTllGc0tKUTZyOXN6R2xqOGtDWVpJMkt1UnF4QktkbzRLT2VSVXc9PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
signerName: kubernetes.io/kube-apiserver-client
expirationSeconds: 86400 # one day
usages:
- client auth
EOF
批准CSR
kubectl certificate approve myuser
导出证书
kubectl get csr myuser -o jsonpath='{.status.certificate}'| base64 -d > myuser.crt
创建用户myuser绑定证书
kubectl config set-credentials myuser --client-key=myuser.key --client-certificate=myuser.crt --embed-certs=true
kubectl config set-context myuser --cluster=kubernetes --user=myuser
测试认证是否通过
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl get pod --user=myuser
Error from server (Forbidden): pods is forbidden: User "zhouzy" cannot list resource "pods" in API group "" in the namespace "default"
通过返回的信息可以发现认证已经通过,但是权限校验没有通过。
创建角色并授权
kubectl create role developer --verb=create --verb=get --verb=list --verb=update --verb=delete --resource=pods
kubectl create rolebinding developer-binding-myuser --role=developer --user=myuser
自定义认证服务规范
URL:https://authn.example.com/authenticate
Method: POST
Input:
k8s 会将 token 信息以TokenReview对象发送至自定义认证服务
{
"apiVersion": "authentication.k8s.io/v1beta1",
"kind": "TokenReview",
"spec": {
"token": "(BEARERTOKEN)"
}
}
Output:
{
"apiVersion": "authentication.k8s.io/v1beta1",
"kind": "TokenReview",
"status": {
"authenticated": true,
"user": {
"username": "[email protected]",
"uid": "42",
"groups": ["dev", "qa"]
}
}
}
自定义认证服务代码
package main
import (
"context"
"encoding/json"
"log"
"net/http"
"github.com/google/go-github/github"
"golang.org/x/oauth2"
authentication "k8s.io/api/authentication/v1beta1"
)
func main() {
http.HandleFunc("/authenticate", func(w http.ResponseWriter, r *http.Request) {
decoder := json.NewDecoder(r.Body)
var tr authentication.TokenReview
err := decoder.Decode(&tr)
if err != nil {
log.Println("[Error]", err.Error())
w.WriteHeader(http.StatusBadRequest)
json.NewEncoder(w).Encode(map[string]interface{}{
"apiVersion": "authentication.k8s.io/v1beta1",
"kind": "TokenReview",
"status": authentication.TokenReviewStatus{
Authenticated: false,
},
})
return
}
log.Print("receving request")
// Check User
ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: tr.Spec.Token},
)
tc := oauth2.NewClient(context.Background(), ts)
client := github.NewClient(tc)
user, _, err := client.Users.Get(context.Background(), "")
if err != nil {
log.Println("[Error]", err.Error())
w.WriteHeader(http.StatusUnauthorized)
json.NewEncoder(w).Encode(map[string]interface{}{
"apiVersion": "authentication.k8s.io/v1beta1",
"kind": "TokenReview",
"status": authentication.TokenReviewStatus{
Authenticated: false,
},
})
return
}
log.Printf("[Success] login as %s", *user.Login)
w.WriteHeader(http.StatusOK)
trs := authentication.TokenReviewStatus{
Authenticated: true,
User: authentication.UserInfo{
Username: *user.Login,
UID: *user.Login,
},
}
json.NewEncoder(w).Encode(map[string]interface{}{
"apiVersion": "authentication.k8s.io/v1beta1",
"kind": "TokenReview",
"status": trs,
})
})
log.Println(http.ListenAndServe(":3000", nil))
}
运行认证服务
编译运行认证服务
go mod init 当前目录
go mod tidy
go run main.go
创建 webhook config
vim /etc/kubernetes/webhook/webhook-config.json
配置文件如下,需要将ip 替换为启动认证服务所在机器 ip (最好不要在master 机器上)
{
"kind": "Config",
"apiVersion": "v1",
"preferences": {},
"clusters": [
{
"name": "github-authn",
"cluster": {
"server": "http://192.168.146.188:3000/authenticate"
}
}
],
"users": [
{
"name": "authn-apiserver",
"user": {
"token": "secret"
}
}
],
"contexts": [
{
"name": "webhook",
"context": {
"cluster": "github-authn",
"user": "authn-apiserver"
}
}
],
"current-context": "webhook"
}
备份旧apiserver yaml 文件
sudo cp /etc/kubernetes/manifests/kube-apiserver.yaml ~/kube-apiserver.yaml.$(date +%F)
修改 apiserver yaml 文件
sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml
启动命令增加 webhook 配置, 并且将配置文件挂载至容器内部,下边是我的k8s 的apiserver 的yaml 配置文件,需要注意应该修改自己原有的apiserver yaml 不能直接复制下边的。
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.146.189:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
......
# 配置 webhook 认证
- --authentication-token-webhook-config-file=/etc/kubernetes/webhook/webhook-config.json
image: registry.k8s.io/kube-apiserver:v1.25.4
imagePullPolicy: IfNotPresent
......
# 将 webhook 配置文件挂载进容器
- name: webhook-config
mountPath: /etc/kubernetes/webhook
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
......
# 将 webhook 配置文件挂载进容器
- hostPath:
path: /etc/kubernetes/webhook
type: DirectoryOrCreate
name: webhook-config
status: {}
在 github 上生成 token
配置kubeconfig,添加user
# cat ~/.kube/config
apiVersion: v1
......
users:
- name: zhouzy-token
user:
token: ghp_jevHquU4g43m46nczWS0ojxxxxxxxxx
验证认证是否通过
发现认证已经通过,但是没有权限查看
在查看下认证服务日志, 发现登录github 成功, 说明认证成功:
[admin@ali-jkt-dc-bnc-airflow-test01 auth]$ go run main.go
2023/02/03 02:34:25 receving request
2023/02/03 02:34:25 [Success] login as itnoobzzy
k8s 支持多种授权机制,并支持同时开启多个授权插件(只要有一个验证通过即可)。
如果鉴权成功的请求将被发送到准入模块做进一步的请求验证;鉴权失败的请求则返回403。
鉴权对象
鉴权插件
RABC(基于角色的权限控制),是目前比较流行的鉴权方式,可以将权限和角色绑定,然后将角色分配给用户,这样用户就可以自己进行授权。
casbin 是Go语言目前比较流行的鉴权通用框架。
Role 只能作用于单个ns, ClusterRole 可以作用于多个 ns 和集群级的资源
角色绑定(Role Binding)是将角色中定义的权限赋予一个或者一组鉴权对象,该鉴权对象可以是用户,组,或者 ServiceAccount.
eg: 将 “pod-reader” 角色与 “default” namespace 绑定 ,并将该权限授予给"jane"
apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read pods in the "default" namespace.
# You need to already have a Role named "pod-reader" in that namespace.
kind: RoleBinding
metadata:
name: read-pods
namespace: default
subjects:
# You can specify more than one "subject"
- kind: User
name: jane # "name" is case sensitive
apiGroup: rbac.authorization.k8s.io
roleRef:
# "roleRef" specifies the binding to a Role / ClusterRole
kind: Role #this must be Role or ClusterRole
name: pod-reader # this must match the name of the Role or ClusterRole you wish to bind to
apiGroup: rbac.authorization.k8s.io
ClusterRoleBinding 是对整个集群的授权
eg: 授权 “manager” group 组下的所有用户对集群所有 ns 的 secret-reader 权限:
apiVersion: rbac.authorization.k8s.io/v1
# This cluster role binding allows anyone in the "manager" group to read secrets in any namespace.
kind: ClusterRoleBinding
metadata:
name: read-secrets-global
subjects:
- kind: Group
name: manager # Name is case sensitive
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: secret-reader
apiGroup: rbac.authorization.k8s.io
当对组进行授权的时候,subjects name 有规定的前缀写法:需要注意的是我们在定义对象名称的时候要避免和 k8s 内置的 name 冲突
对 qa ns 下的所有 sa 进行授权
subjects:
- kind: Group
name: system:serviceaccounts:qa
apiGroup: rbac.authorization.k8s.io
对所有 ns 下的 sa 进行授权
subjects:
- kind: Group
name: system:serviceaccounts
apiGroup: rbac.authorization.k8s.io
对所有认证的用户进行授权
subjects:
- kind: Group
name: system:authenticated
apiGroup: rbac.authorization.k8s.io
对所有未认证的用户进行授权
subjects:
- kind: Group
name: system:unauthenticated
apiGroup: rbac.authorization.k8s.io
对所有用户进行授权
subjects:
- kind: Group
name: system:authenticated
apiGroup: rbac.authorization.k8s.io
- kind: Group
name: system:unauthenticated
apiGroup: rbac.authorization.k8s.io
上边通过 webhook 实现了结合第三方 github 认证, 但是我们还没有对其进行授权,接下来对其进行授权。
可以先看下集群默认管理员角色的权限:
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl get clusterrole cluster-admin -oyaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: "2022-11-15T05:29:14Z"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: cluster-admin
resourceVersion: "72"
uid: 9b2bce8b-ef75-4714-a65f-72276e7c480e
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
- nonResourceURLs:
- '*'
verbs:
- '*'
解下来就需要将该角色与上边创建的已认证用户 zhouzy-token 进行绑定:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: default-admin
namespace: default
subjects:
- kind: User
# 基于 webhook 认证的用户,这里需要注意的是,这里的用户是登录 github 的用户
name: itnoobzzy
apiGroup: rbac.authorization.k8s.io
# - kind: ServiceAccount
# name: default
# namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
验证:
[admin@ali-jkt-dc-bnc-airflow-test02 auth]$ kubectl get pod --user=zhouzy-token
NAME READY STATUS RESTARTS AGE
host-pvc-pod 1/1 Running 0 81d
nfs-static-pod 1/1 Running 0 81d
ngx-dep-6b9d9dd879-hwt6j 1/1 Running 0 77d
ngx-dep-6b9d9dd879-jn9bc 1/1 Running 0 77d
ngx-dep-6b9d9dd879-pm4vh 1/1 Running 0 77d
ngx-hpa-dep-75b9d99c9b-djcwh 1/1 Running 0 77d
ngx-hpa-dep-75b9d99c9b-nvbwk 1/1 Running 0 77d
test 1/1 Running 0 77d
请求经过认证,鉴权后的最后一步就是准入,请求拥有权限不代表是合法的,比如说当资源有限的时候该如何进行配额管理。
准入控制在授权后对请求做进一步的验证或添加参数。
准入控制支持同时开启多个插件,它们依次调用,只有全部插件都通过的请求才可以放过进入系统。
为资源增加自定义属性
准入控制一般都会先将请求变形,为请求增加一些自定义属性,比如说只有当namespace中有有效用户信息时,才可以在创建 namespace 时,自动绑定用户权限,namespace 才可用。
配额管理
资源有限,如何限定某个用户有多少资源?
限制在 default namespace 下最多可以创建3个configmap
创建quota.yaml 并生成对应的 RQ 对象:
apiVersion: v1
kind: ResourceQuota
metadata:
name: object-counts
namespace: default
spec:
hard:
configmaps: "3"
查看 default ns 下已有两个configmap:
[admin@ali-jkt-dc-bnc-airflow-test02 k8s]$ kubectl get configmap
NAME DATA AGE
kube-root-ca.crt 1 84d
ngx-conf 1 84d
准备两个新的configmap yaml 文件生成configmap 对象:
apiVersion: v1
data:
default.conf: |
server {
listen 80;
location / {
default_type text/plain;
return 200
'srv : $server_addr:$server_port\nhost: $hostname\nuri : $request_method $host $request_uri\ndate: $time_iso8601\n';
}
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"default.conf":"server {\n listen 80;\n location / {\n default_type text/plain;\n return 200\n 'srv : $server_addr:$server_port\\nhost: $hostname\\nuri : $request_method $host $request_uri\\ndate: $time_iso8601\\n';\n }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"ngx-conf","namespace":"default"}}
creationTimestamp: "2022-11-15T15:33:05Z"
name: ngx-conf1
namespace: default
resourceVersion: "47125"
uid: 3e3e39f0-d1ca-47d7-9a48-734e46d75ccb
apiVersion: v1
data:
default.conf: |
server {
listen 80;
location / {
default_type text/plain;
return 200
'srv : $server_addr:$server_port\nhost: $hostname\nuri : $request_method $host $request_uri\ndate: $time_iso8601\n';
}
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"default.conf":"server {\n listen 80;\n location / {\n default_type text/plain;\n return 200\n 'srv : $server_addr:$server_port\\nhost: $hostname\\nuri : $request_method $host $request_uri\\ndate: $time_iso8601\\n';\n }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"ngx-conf","namespace":"default"}}
creationTimestamp: "2022-11-15T15:33:05Z"
name: ngx-conf2
namespace: default
resourceVersion: "47125"
uid: 3e3e39f0-d1ca-47d7-9a48-734e46d75ccb
如果依次生成按照预期应该是只能在创建一个,另一个会创建失败,验证果然在创建第4个configmap 的时候创建失败了:
此时可以查看下rq 对象:
[admin@ali-jkt-dc-bnc-airflow-test02 k8s]$ kubectl get resourcequota -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: ResourceQuota
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"ResourceQuota","metadata":{"annotations":{},"name":"object-counts","namespace":"default"},"spec":{"hard":{"configmaps":"3"}}}
creationTimestamp: "2023-02-08T00:54:11Z"
name: object-counts
namespace: default
resourceVersion: "9976427"
uid: e1217bcf-8317-498b-8ac1-f1e90bf055b2
spec:
hard:
configmaps: "3"
status:
hard:
configmaps: "3"
used:
configmaps: "3"
kind: List
metadata:
resourceVersion: ""
spec 是我们设置的资源限额,status 是对应 ns 中的资源使用情况。
社区目前已经有很多准入插件,下边是一些常用的插件:
AlwaysAdmit: 接收所有请求
AlwaysPullImages: 总是拉取最新镜像,在多租户场景下非常有用。
AlwaysPullImages: 通过 webhook 决定 image 策略,需要同时配置 --admission-control-config-file。
ServiceAccount: 自动创建默认 ServiceAccount, 并确保 Pod 引用的 ServiceAccount 已经存在。
SecurityContextDeny: 拒绝包含非法 SecurityContext 配置的容器。
ResourceQuota:限制 Pod 的请求不会超过配额,需要在 namespace 中创建一个 ResourceQuota 对象。
LimitRanger:为 Pod 设置默认资源请求和限制,需要在 namespace 中创建一个 LimitRange 对象。
InitialResource:根据镜像的历史使用记录,为容器设置默认资源请求和限制。
NamespaceLifecycle:确保处于 termination 状态的 namespace 不再接收新的对象创建请求,并拒绝请求不存在的 namespace。
DefaultStorageClass:为PVC 设置默认 StorageClass。
DefaultTolerationSeconds: 设置 Pod 的默认forgiveness toleration 为5分钟。
查看已有的默认插件配置:
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-565d847f94-46b8q 1/1 Running 0 84d
coredns-565d847f94-slw2m 1/1 Running 0 84d
etcd-ali-jkt-dc-bnc-airflow-test02 1/1 Running 0 84d
kube-apiserver-ali-jkt-dc-bnc-airflow-test02 1/1 Running 0 4d23h
kube-controller-manager-ali-jkt-dc-bnc-airflow-test02 1/1 Running 14 (4d23h ago) 84d
kube-proxy-48k6j 1/1 Running 0 84d
kube-proxy-grdwp 1/1 Running 0 21d
kube-proxy-m5m24 1/1 Running 0 84d
kube-scheduler-ali-jkt-dc-bnc-airflow-test02 1/1 Running 13 (4d23h ago) 84d
metrics-server-d55786594-7wr2c 1/1 Running 0 78d
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl -n kube-system exec -it kube-apiserver-ali-jkt-dc-bnc-airflow-test02 -- kube-apiserver -h
The Kubernetes API server validates and configures data
for the api objects which include pods, services, replicationcontrollers, and
......
--disable-admission-plugins strings
admission plugins that should be disabled although they are in the default enabled plugins list (NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, PodSecurity, Priority, DefaultTolerationSeconds,
DefaultStorageClass, StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook,
ValidatingAdmissionWebhook, ResourceQuota). Comma-delimited list of admission plugins: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass,
DefaultStorageClass, DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook,
NamespaceAutoProvision, NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodTolerationRestriction,
Priority, ResourceQuota, RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook. The order of plugins in this flag does not matter.
--enable-admission-plugins strings
admission plugins that should be enabled in addition to default enabled ones (NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, PodSecurity, Priority, DefaultTolerationSeconds, DefaultStorageClass,
StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook, ValidatingAdmissionWebhook,
ResourceQuota). Comma-delimited list of admission plugins: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass,
DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook, NamespaceAutoProvision,
NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodTolerationRestriction, Priority, ResourceQuota,
RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook. The order of plugins in this flag does not matter.
......
除了默认的准入控制插件以外,k8s 预留了准入控制插件的扩展点,用户可自定义准入控制插件实现自定义准入功能。
MutatingWebhookConfiguration: 变形插件,支持对准入对象的修改。
ValidatingWebhookConfiguration:校验插件,只能对准入对象合法性进行校验,不能修改。
自定义 webhook 与 apiserver 间的准入控制是通过 AdmissionReview 对象来交互的。
当配置了 MutatingWebhookConfiguration 或者 ValidatingWebhookConfiguration 后 k8s 会将整个Pod的请求对象AdmissionReview 发送至对应的 webhook 去进行变形或者校验。
需要注意的是所有的准入控制的 webhook server 端都必须是 https 的访问方式。
自定义插件说明
作用:当创建Pod 时,会校验是否设置了 runAsNonRoot, 如果没有设置默认为False。
当为True 时,容器的用户 id 不能为 0
代码地址: https://github.com/cncamp/admission-controller-webhook-demo
部署步骤
# 拉取代码
[admin@ali-jkt-dc-bnc-airflow-test02 k8s]$ git clone https://github.com/cncamp/admission-controller-webhook-demo.git
Cloning into 'admission-controller-webhook-demo'...
remote: Enumerating objects: 641, done.
remote: Counting objects: 100% (63/63), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 641 (delta 33), reused 26 (delta 26), pack-reused 578
Receiving objects: 100% (641/641), 2.43 MiB | 0 bytes/s, done.
Resolving deltas: 100% (140/140), done.
# 执行部署脚本
[admin@ali-jkt-dc-bnc-airflow-test02 admission-controller-webhook-demo]$ ./deploy.sh
Generating TLS keys ...
Generating a 2048 bit RSA private key
.............................................................................+++
..+++
writing new private key to 'ca.key'
-----
Generating RSA private key, 2048 bit long modulus
............................................................................+++
.......+++
e is 65537 (0x10001)
Signature ok
subject=/CN=webhook-server.webhook-demo.svc
Getting CA Private Key
Creating Kubernetes objects ...
namespace/webhook-demo created
secret/webhook-server-tls created
deployment.apps/webhook-server created
service/webhook-server created
mutatingwebhookconfiguration.admissionregistration.k8s.io/demo-webhook created
The webhook server has been deployed and configured!
# 验证webhook 服务启动成功
[admin@ali-jkt-dc-bnc-airflow-test02 admission-controller-webhook-demo]$ kubectl -n webhook-demo get pods
NAME READY STATUS RESTARTS AGE
webhook-server-88ccccd9f-pbh56 1/1 Running 0 4m32s
部署 runAsNonRoot nor runAsUser yaml 文件验证:
# A pod with no securityContext specified.
# Without the webhook, it would run as user root (0). The webhook mutates it
# to run as the non-root user with uid 1234.
apiVersion: v1
kind: Pod
metadata:
name: pod-with-defaults
labels:
app: pod-with-defaults
spec:
restartPolicy: OnFailure
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "echo I am running as user $(id -u)"]
[admin@ali-jkt-dc-bnc-airflow-test02 admission-controller-webhook-demo]$ kubectl create -f examples/pod-with-defaults.yaml
pod/pod-with-defaults created
# 发现 pod spec/securityContext 下多了runAsNonRoot 和 runAsUser 信息
# 这就是经过自定义 mutatingwebhook 变形后的信息
[admin@ali-jkt-dc-bnc-airflow-test02 admission-controller-webhook-demo]$ kubectl get pod/pod-with-defaults -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-02-09T01:03:27Z"
labels:
app: pod-with-defaults
name: pod-with-defaults
namespace: default
resourceVersion: "10106854"
uid: 8b83e295-c36c-48c6-ae67-9ad3573cc826
spec:
......
securityContext:
runAsNonRoot: true
runAsUser: 1234
......
部署一个可以用 root 运行的 pod:
# A pod with a securityContext explicitly allowing it to run as root.
# The effect of deploying this with and without the webhook is the same. The
# explicit setting however prevents the webhook from applying more secure
# defaults.
apiVersion: v1
kind: Pod
metadata:
name: pod-with-override
labels:
app: pod-with-override
spec:
restartPolicy: OnFailure
securityContext:
runAsNonRoot: false
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "echo I am running as user $(id -u)"]
$ kubectl create -f examples/pod-with-override.yaml
$ kubectl get pod/pod-with-override -o yaml
...
securityContext:
runAsNonRoot: false
...
$ kubectl logs pod-with-override
I am running as user 0
部署一个冲突的pod: 非root 运行但是用户id 为0的:
# A pod with a conflicting securityContext setting: it has to run as a non-root
# user, but we explicitly request a user id of 0 (root).
# Without the webhook, the pod could be created, but would be unable to launch
# due to an unenforceable security context leading to it being stuck in a
# `CreateContainerConfigError` status. With the webhook, the creation of
# the pod is outright rejected.
apiVersion: v1
kind: Pod
metadata:
name: pod-with-conflict
labels:
app: pod-with-conflict
spec:
restartPolicy: OnFailure
securityContext:
runAsNonRoot: true
runAsUser: 0
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "echo I am running as user $(id -u)"]
发现会提示错误:
[admin@ali-jkt-dc-bnc-airflow-test02 admission-controller-webhook-demo]$ kubectl create -f examples/pod-with-conflict.yaml
Error from server: error when creating "examples/pod-with-conflict.yaml": admission webhook "webhook-server.webhook-demo.svc" denied the request: runAsNonRoot specified, but runAsUser set to 0 (the root user)
常见限流算法
API Server 中的限流
每个优先级都有各自的配置,设定允许分发的并发请求数。
传统限流方法的局限性
每次从 QueueSet 中取请求执行时,会先应用 fair queuing 算法从 QueueSet 中选取一个 queue, 然后从这个 queue 中取出 oldest 请求执行。
所以即使是同一个 PL 内的请求,也不会出现一个 Flow 内的请求一直占用资源的不公平现象。
当大量相同优先级请求进入同一队列时,如果有一个坏请求阻塞了队列,将会导致所有请求失败,例如一个有故障的客户端疯狂像 kube-apiserver 发送请求,占满了队列,导致其他客户端无法请求。
APF 采用公平队列算法处理具有相同优先级的请求:
这里的流区分项可以是发出请求的用户、目标资源的名称空间或什么都不是。
豁免请求
某些特别重要的请求可以无视 APF 排队,直接插队。这些豁免可防止不当的流控配置完全禁用API 服务器。
flowschema 默认配置如下:
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl get flowschema
NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL
exempt exempt 1 > 87d False
probes exempt 2 > 87d False
system-leader-election leader-election 100 ByUser 87d False
endpoint-controller workload-high 150 ByUser 87d False
workload-leader-election leader-election 200 ByUser 87d False
system-node-high node-high 400 ByUser 87d False
system-nodes system 500 ByUser 87d False
kube-controller-manager workload-high 800 ByNamespace 87d False
kube-scheduler workload-high 800 ByNamespace 87d False
kube-system-service-accounts workload-high 900 ByNamespace 87d False
service-accounts workload-low 9000 ByUser 87d False
global-default global-default 9900 ByUser 87d False
catch-all catch-all 10000 ByUser 87d False
其中前两条 PL 级别外 exempt 的就是豁免请求,还有对应选举的、节点的、控制面的、调度、service等请求的 FlowSchema, 从 1 至 10000 权重递减,DISTINGUISHERMETHOD 决定根据用户信息还是命名空间来划分至不同的 Flow 流中。
FlowSchema 匹配一些入站请求,并将它们分配给优先级。
每个入站请求都会对所有FlowSchema 测试是否匹配,首先从 matchingPrecedence 数值最低的匹配开始(逻辑上匹配度最高),然后依次进行,直到首个匹配出现。
查看一个具体的FlowSchema 的yaml 说明:
默认的队列权重配置(PLC)如下:
system
leader-election
workload-high
workload-low
global-default
exempt
catch-all
上边的所有FlowSchema 和 PLC 的配置都是默认的,当生产环境 k8s 落地时需要根据具体的集群规模和请求量来进行对应的压测、修改配置,调试至最佳状态。
查看所有优先级及其当前状态
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels
PriorityLevelName, ActiveQueues, IsIdle, IsQuiescing, WaitingRequests, ExecutingRequests
leader-election, 0, true, false, 0, 0
node-high, 0, true, false, 0, 0
system, 0, true, false, 0, 0
workload-high, 0, true, false, 0, 0
workload-low, 0, true, false, 0, 0
catch-all, 0, true, false, 0, 0
exempt, >, >, >, >, >
global-default, 0, true, false, 0, 0
查看所有队列及其当前状态的列表
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl get --raw /debug/api_priority_and_fairness/dump_queues
PriorityLevelName, Index, PendingRequests, ExecutingRequests, SeatsInUse, NextDispatchR, InitialSeatsSum, MaxSeatsSum, TotalWorkSum
workload-high, 0, 0, 0, 0, 796.18416512ss, 0, 0, 0.00000000ss
workload-high, 1, 0, 0, 0, 591.03703740ss, 0, 0, 0.00000000ss
workload-high, 2, 0, 0, 0, 0.00000000ss, 0, 0, 0.00000000ss
workload-high, 3, 0, 0, 0, 796.20406911ss, 0, 0, 0.00000000ss
workload-high, 4, 0, 0, 0, 0.00000000ss, 0, 0, 0.00000000ss
workload-high, 5, 0, 0, 0, 796.19769904ss, 0, 0, 0.00000000ss
......
查看当前正在队列中等待的所有请求
[admin@ali-jkt-dc-bnc-airflow-test02 ~]$ kubectl get --raw /debug/api_priority_and_fairness/dump_requests
PriorityLevelName, FlowSchemaName, QueueIndex, RequestIndexInQueue, FlowDistingsher, ArriveTime, InitialSeats, FinalSeats, AdditionalLatency
上边的命令都是在我搭建的测试环境的 k8s 集群中执行的,没有请求,所以看到的都是空的。