# test-quota.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: gateway-quota spec: template: spec: containers: - name: test-container image: nginx
$ kubectl get pods NAME READY STATUS RESTARTS AGE gateway-quota-551394438-pix5d 1/1 Running 0 16s
$ kubectl get pods NAME READY STATUS RESTARTS AGE gateway-quota-551394438-pix5d 1/1 Running 0 9m
$ kubectl describe deploy/gateway-quota Name: gateway-quota Namespace: fail CreationTimestamp: Sat, 11 Feb 2017 16:33:16 -0500 Labels: app=gateway Selector: app=gateway Replicas: 1 updated | 3 total | 1 available | 2 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 1 max unavailable, 1 max surge OldReplicaSets: NewReplicaSet: gateway-quota-551394438 (1/3 replicas created) Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 9m 9m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set gateway-quota-551394438 to 1 5m 5m 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set gateway-quota-551394438 to 3
kubectl describe replicaset gateway-quota-551394438 Name: gateway-quota-551394438 Namespace: fail Image(s): nginx Selector: app=gateway,pod-template-hash=551394438 Labels: app=gateway pod-template-hash=551394438 Replicas: 1 current / 3 desired Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed No volumes. Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 11m 11m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: gateway-quota-551394438-pix5d 11m 30s 33 {replicaset-controller } Warning FailedCreate Error creating: pods "gateway-quota-551394438-" is forbidden: exceeded quota: compute-resources, requested: pods=1, used: pods=1, limited: pods=1
# cpu-scale.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cpu-scale spec: template: metadata: labels: app: cpu-scale spec: containers: - name: test-container image: nginx resources: requests: cpu: 1
$ kubectl get pods NAME READY STATUS RESTARTS AGE cpu-scale-908056305-xstti 1/1 Running 0 5m
$ kubectl scale deploy/cpu-scale --replicas=2 deployment "cpu-scale" scaled $ kubectl get pods NAME READY STATUS RESTARTS AGE cpu-scale-908056305-phb4j 0/1 Pending 0 4m cpu-scale-908056305-xstti 1/1 Running 0 5m
$ kubectl describe pod cpu-scale-908056305-phb4j Name: cpu-scale-908056305-phb4j Namespace: fail Node: gke-ctm-1-sysdig2-35e99c16-qwds/10.128.0.4 Start Time: Sun, 12 Feb 2017 08:57:51 -0500 Labels: app=cpu-scale pod-template-hash=908056305 Status: Pending IP: Controllers: ReplicaSet/cpu-scale-908056305 [...] Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 3m 3m 1 {default-scheduler } Warning FailedScheduling pod (cpu-scale-908056305-phb4j) failed to fit in any node fit failure on node (gke-ctm-1-sysdig2-35e99c16-wx0s): Insufficient cpu fit failure on node (gke-ctm-1-sysdig2-35e99c16-tgfm): Insufficient cpu fit failure on node (gke-ctm-1-sysdig2-35e99c16-qwds): Insufficient cpu
# volume-test.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: volume-test spec: template: metadata: labels: app: volume-test spec: containers: - name: test-container image: nginx volumeMounts: - mountPath: /test name: test-volume volumes: - name: test-volume # This GCE PD must already exist (oops!) gcePersistentDisk: pdName: my-data-disk fsType: ext4
kubectl get pods NAME READY STATUS RESTARTS AGE volume-test-3922807804-33nux 0/1 ContainerCreating 0 3m
$ kubectl describe pod volume-test-3922807804-33nux Name: volume-test-3922807804-33nux Namespace: fail Node: gke-ctm-1-sysdig2-35e99c16-qwds/10.128.0.4 Start Time: Sun, 12 Feb 2017 09:24:50 -0500 Labels: app=volume-test pod-template-hash=3922807804 Status: Pending IP: Controllers: ReplicaSet/volume-test-3922807804 [...] Volumes: test-volume: Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine) PDName: my-data-disk FSType: ext4 Partition: 0 ReadOnly: false [...] Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 4m 4m 1 {default-scheduler } Normal Scheduled Successfully assigned volume-test-3922807804-33nux to gke-ctm-1-sysdig2-35e99c16-qwds 1m 1m 1 {kubelet gke-ctm-1-sysdig2-35e99c16-qwds} Warning FailedMount Unable to mount volumes for pod "volume-test-3922807804-33nux_fail(e2180d94-f12e-11e6-bd01-42010af0012c)": timeout expired waiting for volumes to attach/mount for pod "volume-test-3922807804-33nux"/"fail". list of unattached/unmounted volumes=[test-volume] 1m 1m 1 {kubelet gke-ctm-1-sysdig2-35e99c16-qwds} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "volume-test-3922807804-33nux"/"fail". list of unattached/unmounted volumes=[test-volume] 3m 50s 3 {controller-manager } Warning FailedMount Failed to attach volume "test-volume" on node "gke-ctm-1-sysdig2-35e99c16-qwds" with: GCE persistent disk not found: diskName="my-data-disk" zone="us-central1-a"
$ kubectl create -f test-application.deploy.yaml error: error validating "test-application.deploy.yaml": error validating data: found invalid field resources for v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false
# test-application.deploy.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: test-app spec: template: metadata: labels: app: test-app spec: containers: - image: nginx name: nginx resources: limits: cpu: 100m memory: 200Mi requests: cpu: 100m memory: 100Mi
Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise.因为我们把我们的镜像 tag 标记为 :v1 ,默认的镜像拉取策略是 IfNotPresent。Kubelet 在本地已经有一份 rosskukulinski/myapplication:v1 的拷贝了,因此它就不会在做 docker pull 动作了。当新的 pod 出现的时候,它仍然使用了老的有问题的镜像。
原文链接:10 Most Common Reasons Kubernetes Deployments Fail (Part 2)(翻译:池剑锋)
原文发布时间为:2017-04-20
本文作者:池剑锋
本文来自云栖社区合作伙伴Dockerone.io,了解相关信息可以关注Dockerone.io。
原文标题:Kubernetes 部署失败的 10 个最普遍原因(Part 2)