参考官网:
https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product-uuid-are-unique-for-every-node
kubeadm init 配置文件参数参考:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/
5台 centos7 最新的系统
etc 集群跑在3台 master 节点上
网络组件使用 calico
主机名 | ip | 说明 | 组件 |
---|---|---|---|
k8s-company01-master01 ~ 03 | 172.16.4.201 ~ 203 | 3个 master 节点 | keepalived、haproxy、etcd、kubelet、kube-apiserver |
k8s-company01-worker001 ~ 002 | 172.16.4.204 ~ 205 | 2个 worker 节点 | kubelet |
k8-company01-lb | 172.16.4.200 | keepalived虚IP |
1. 虚拟机确定 mac 和主机uuid 是唯一的。
(uuid 查看方法:cat /sys/class/dmi/id/product_uuid)
2. Swap disabled.
(执行命令:swapoff -a; sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab)
3. 习惯性关闭 selinux,设置时区timedatectl set-timezone Asia/Shanghai,可选:echo "Asia/Shanghai" > /etc/timezone
4. 更新时间(etcd 对时间一致性要求高)ntpdate asia.pool.ntp.org
(写入到 crontab:8 * * * * /usr/sbin/ntpdate asia.pool.ntp.org && /sbin/hwclock --systohc )
4. yum update 到最新并重启系统让新的内核生效。
备注:关闭 selinux
setenforce 0
sed -i --follow-symlinks "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
sed -i --follow-symlinks "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/selinux/config
关闭 firewalld,如果不关闭,后面很多 k8s 以外的组件会网络不通,一个个排查很麻烦,由于我们 k8s 在内网,就直接关闭了。
systemctl stop firewalld.service
systemctl disable firewalld.service
配置主机名(注意根据实际环境修改主机名):
5台主机分别设置主机名:
hostnamectl set-hostname k8s-company01-master01
hostnamectl set-hostname k8s-company01-master02
hostnamectl set-hostname k8s-company01-master03
hostnamectl set-hostname k8s-company01-worker001
hostnamectl set-hostname k8s-company01-worker002
在5台主机的/etc/hosts 中添加
cat >> /etc/hosts <
安装参考:https://kubernetes.io/docs/setup/cri/
## Install prerequisites.
yum install -y yum-utils device-mapper-persistent-data lvm2
## Add docker repository.
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
## 查看 docker-ce 所有版本:yum list docker-ce --showduplicates | sort -r
## Install docker.if use 'yum install docker-ce' is install the latest.Here we use specify version:
yum install -y docker-ce-18.09.5 docker-ce-cli-18.09.5
# Setup daemon.
mkdir /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
# Restart docker.
systemctl daemon-reload
systemctl enable docker.service
systemctl restart docker
yum -y install yum-plugin-versionlock
yum versionlock docker-ce docker-ce-cli
yum versionlock list
# 注:
# 解锁
# yum versionlock delete docker-ce docker-ce-cli
## Some users on RHEL/CentOS 7 have reported issues with traffic being routed incorrectly due to iptables being bypassed. You should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config, e.g.
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.may_detach_mounts = 1
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720
EOF
modprobe br_netfilter
sysctl --system
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF
同时安装ipvsadm,后面kube-proxy会采用ipvs的方式(cri-tools-1.12.0 kubernetes-cni-0.7.5 是两个关联包)
yum install -y kubelet-1.14.1 kubeadm-1.14.1 kubectl-1.14.1 cri-tools-1.12.0 kubernetes-cni-0.7.5 ipvsadm --disableexcludes=kubernetes
# 加载 ipvs 相关内核模块
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
modprobe br_netfilter
# 加入开机启动中
cat <<EOF >>/etc/rc.d/rc.local
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
modprobe br_netfilter
EOF
##默认 rc.local 软链接源文件没有可执行权限需要加上可执行权限
chmod +x /etc/rc.d/rc.local
lsmod | grep ip_vs
# 配置kubelet使用国内pause镜像
# 配置kubelet的cgroups
# 获取docker的cgroups
DOCKER_CGROUPS=$(docker info | grep 'Cgroup' | cut -d' ' -f3)
echo $DOCKER_CGROUPS
cat > /etc/sysconfig/kubelet <<EOF
KUBELET_EXTRA_ARGS="--cgroup-driver=$DOCKER_CGROUPS --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1"
EOF
# 开机启动并 start now,之后 kubelet 启动是失败的,每隔几秒钟会自动重启,这是在等待 kubeadm 告诉它要做什么。
systemctl enable --now kubelet
# 添加 kubectl 参数 tab 键自动补全功能
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
如下操作在三个master节点操作,采用16443端口代理k8s的6443端口,注意修改最后面的主机名和 ip,如server k8s-company01-master01 172.16.4.201:6443
# 拉取haproxy镜像【采用alpine小镜像版本】
docker pull reg01.sky-mobi.com/k8s/haproxy:1.9.1-alpine
mkdir /etc/haproxy
cat >/etc/haproxy/haproxy.cfg<
# 拉取keepalived镜像
docker pull reg01.sky-mobi.com/k8s/keepalived:2.0.10
# 启动keepalived , 注意修改网卡名和 ip
# eth0为本次实验172.16.4.0/24网段的所在网卡(如果你的不是,请改成自己的网卡名称,用法参考https://github.com/osixia/docker-keepalived/tree/v2.0.10)
# 密码不要超过8位,如果是skymobk8s,则发送的 vrrp 包中只有8前8位:addrs: k8s-master-lb auth "skymobik"
# KEEPALIVED_PRIORITY Master节点设置为200 ,其他backup 上设置为150
docker run --net=host --cap-add=NET_ADMIN \
-e KEEPALIVED_ROUTER_ID=55 \
-e KEEPALIVED_INTERFACE=eth0 \
-e KEEPALIVED_VIRTUAL_IPS="#PYTHON2BASH:['172.16.4.200']" \
-e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['172.16.4.201','172.16.4.202','172.16.4.203']" \
-e KEEPALIVED_PASSWORD=skyk8stx \
-e KEEPALIVED_PRIORITY=150 \
--name k8s-keepalived \
--restart always \
-d reg01.sky-mobi.com/k8s/keepalived:2.0.10
# 查看日志
# 会看到两个成为backup 一个成为master
docker logs k8s-keepalived
# 如果日志中有 received an invalid passwd! 的信息,网络中有配置相同的 ROUTER_ID,修改 ROUTER_ID 即可。
# 从任意一台 master 上ping测试
ping -c 4 虚IP
# 如果上述配置失败后,需清理重新实验
docker stop k8s-keepalived
docker rm k8s-keepalived
https://kubernetes.io/docs/setup/independent/high-availability/
第一台:k8s-master01 上操作
# 注意修改 controlPlaneEndpoint: "k8s-company01-lb:16443" 中对应的虚ip主机名
cat << EOF > kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: v1.14.1
# add the available imageRepository in china
imageRepository: reg01.sky-mobi.com/k8s/k8s.gcr.io
controlPlaneEndpoint: "k8s-company01-lb:16443"
networking:
podSubnet: "10.254.0.0/16"
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
ipvs:
minSyncPeriod: 1s
syncPeriod: 10s
mode: ipvs
EOF
kubeadm-config 参数参考:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-config/
预拉取镜像:
kubeadm config images pull --config kubeadm-config.yaml
master01 初始化:
kubeadm init --config=kubeadm-config.yaml --experimental-upload-certs
注意刚开始的打印出的信息,根据提示,消除掉所有的 WARNING
如果想要重来,使用 kubeadm reset 命令,并且按照提示清空 iptables 和 ipvs 配置,然后重启 docker 服务。
提示成功后,记录下最后 join 的所有参数,用于后面节点的加入(两小时内有效。一个用于 master 节点的加入,一个用于 worker 节点的加入)
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383 \
--experimental-control-plane --certificate-key b56be86f65e73d844bb60783c7bd5d877fe20929296a3e254854d3b623bb86f7
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --experimental-upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383
记得执行如下命令,以便使用 kubectl访问集群
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
# 如果不执行,将会出现一下报错:
# [root@k8s-master01 ~]# kubectl -n kube-system get pod
# The connection to the server localhost:8080 was refused - did you specify the right host or port?
查看集群状态时,coredns pending 没关系,因为网络插件还没装
# 显示结果作为参考
[root@k8s-master01 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-56c9dc7946-5c5z2 0/1 Pending 0 34m
coredns-56c9dc7946-thqwd 0/1 Pending 0 34m
etcd-k8s-master01 1/1 Running 2 34m
kube-apiserver-k8s-master01 1/1 Running 2 34m
kube-controller-manager-k8s-master01 1/1 Running 1 33m
kube-proxy-bl9c6 1/1 Running 2 34m
kube-scheduler-k8s-master01 1/1 Running 1 34m
# 使用之前生成的 join 参数将 master02 和 master03 加入集群(--experimental-control-plane 会自动加入服务集群
kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383 \
--experimental-control-plane --certificate-key b56be86f65e73d844bb60783c7bd5d877fe20929296a3e254854d3b623bb86f7
# 如果join 参数没有记下来,或者已经失效,参考:
http://wiki.sky-mobi.com:8090/pages/viewpage.action?pageId=9079715
# 成功加入后,添加下 kubectl 的访问集群权限
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
安装 calico 网络插件(在 master01 上操作)
参考:
https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico
下载 yaml 文件( 这里的版本是 v3.6.1,文件源于官网https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/typha/calico.yaml 修改过网段和replicas以及 image 地址)
# 机房外部使用(有访问限制,公司自己的公网地址)
curl http://111.1.17.135/yum/scripts/k8s/calico_v3.6.1.yaml -O
# 机房内部使用
curl http://192.168.160.200/yum/scripts/k8s/calico_v3.6.1.yaml -O
## 修改 yaml 文件,网络地址段改成和kubeadm-config.yaml 中podSubnet 一致。
##
## export POD_CIDR="10.254.0.0/16" ; sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico.yaml
## replicas 改成3份,用于生产(默认是1)
## 还修改过镜像地址,镜像放到了reg01.sky-mobi.com 上
# 需要开启允许pod 被调度到master 节点上(在master01 上执行就行)
[root@k8s-company01-master01 ~]# kubectl taint nodes --all node-role.kubernetes.io/master-
node/k8s-company01-master01 untainted
node/k8s-company01-master02 untainted
node/k8s-company01-master03 untainted
# 安装 calico (卸载是kubectl delete -f calico_v3.6.1.yaml)
[root@k8s-company01-master01 ~]# kubectl apply -f calico_v3.6.1.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
service/calico-typha created
deployment.apps/calico-typha created
poddisruptionbudget.policy/calico-typha created
daemonset.extensions/calico-node created
serviceaccount/calico-node created
deployment.extensions/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
# 至此,所有pod 运行正常
[root@k8s-company01-master01 ~]# kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-749f7c8df8-knlx4 0/1 Running 0 20s
calico-kube-controllers-749f7c8df8-ndf55 0/1 Running 0 20s
calico-kube-controllers-749f7c8df8-pqxlx 0/1 Running 0 20s
calico-node-4txj7 0/1 Running 0 21s
calico-node-9t2l9 0/1 Running 0 21s
calico-node-rtxlj 0/1 Running 0 21s
calico-typha-646cdc958c-7j948 0/1 Pending 0 21s
coredns-56c9dc7946-944nt 0/1 Running 0 4m9s
coredns-56c9dc7946-nh2sk 0/1 Running 0 4m9s
etcd-k8s-company01-master01 1/1 Running 0 3m26s
etcd-k8s-company01-master02 1/1 Running 0 2m52s
etcd-k8s-company01-master03 1/1 Running 0 110s
kube-apiserver-k8s-company01-master01 1/1 Running 0 3m23s
kube-apiserver-k8s-company01-master02 1/1 Running 0 2m53s
kube-apiserver-k8s-company01-master03 1/1 Running 1 111s
kube-controller-manager-k8s-company01-master01 1/1 Running 1 3m28s
kube-controller-manager-k8s-company01-master02 1/1 Running 0 2m52s
kube-controller-manager-k8s-company01-master03 1/1 Running 0 56s
kube-proxy-8wm4v 1/1 Running 0 4m9s
kube-proxy-vvdrl 1/1 Running 0 2m53s
kube-proxy-wnctx 1/1 Running 0 2m2s
kube-scheduler-k8s-company01-master01 1/1 Running 1 3m18s
kube-scheduler-k8s-company01-master02 1/1 Running 0 2m52s
kube-scheduler-k8s-company01-master03 1/1 Running 0 55s
# 所有master 节点都是 ready 状态
[root@k8s-company01-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-company01-master01 Ready master 4m48s v1.14.1
k8s-company01-master02 Ready master 3m12s v1.14.1
k8s-company01-master03 Ready master 2m21s v1.14.1
# 遇到 coredns 不停重启,关闭 firewalld 后正常,再次开启 firewalld 也正常了...
# 与 master 加入集群的区别是少了 --experimental-control-plane 参数
kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383
# 如果join 参数没有记下来,或者已经失效,参考
http://wiki.sky-mobi.com:8090/pages/viewpage.action?pageId=9079715
# 添加成功显示:
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
### kubectl get nodes 命令在任意 master 节点执行。
[root@k8s-company01-master01 ~]# kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-749f7c8df8-knlx4 1/1 Running 1 5m2s 10.254.28.66 k8s-company01-master02
calico-kube-controllers-749f7c8df8-ndf55 1/1 Running 4 5m2s 10.254.31.67 k8s-company01-master03
calico-kube-controllers-749f7c8df8-pqxlx 1/1 Running 4 5m2s 10.254.31.66 k8s-company01-master03
calico-node-4txj7 1/1 Running 0 5m3s 172.16.4.203 k8s-company01-master03
calico-node-7fqwh 1/1 Running 0 68s 172.16.4.205 k8s-company01-worker002
calico-node-9t2l9 1/1 Running 0 5m3s 172.16.4.201 k8s-company01-master01
calico-node-rkfxj 1/1 Running 0 86s 172.16.4.204 k8s-company01-worker001
calico-node-rtxlj 1/1 Running 0 5m3s 172.16.4.202 k8s-company01-master02
calico-typha-646cdc958c-7j948 1/1 Running 0 5m3s 172.16.4.204 k8s-company01-worker001
coredns-56c9dc7946-944nt 0/1 CrashLoopBackOff 4 8m51s 10.254.28.65 k8s-company01-master02
coredns-56c9dc7946-nh2sk 0/1 CrashLoopBackOff 4 8m51s 10.254.31.65 k8s-company01-master03
etcd-k8s-company01-master01 1/1 Running 0 8m8s 172.16.4.201 k8s-company01-master01
etcd-k8s-company01-master02 1/1 Running 0 7m34s 172.16.4.202 k8s-company01-master02
etcd-k8s-company01-master03 1/1 Running 0 6m32s 172.16.4.203 k8s-company01-master03
kube-apiserver-k8s-company01-master01 1/1 Running 0 8m5s 172.16.4.201 k8s-company01-master01
kube-apiserver-k8s-company01-master02 1/1 Running 0 7m35s 172.16.4.202 k8s-company01-master02
kube-apiserver-k8s-company01-master03 1/1 Running 1 6m33s 172.16.4.203 k8s-company01-master03
kube-controller-manager-k8s-company01-master01 1/1 Running 1 8m10s 172.16.4.201 k8s-company01-master01
kube-controller-manager-k8s-company01-master02 1/1 Running 0 7m34s 172.16.4.202 k8s-company01-master02
kube-controller-manager-k8s-company01-master03 1/1 Running 0 5m38s 172.16.4.203 k8s-company01-master03
kube-proxy-8wm4v 1/1 Running 0 8m51s 172.16.4.201 k8s-company01-master01
kube-proxy-k8rng 1/1 Running 0 68s 172.16.4.205 k8s-company01-worker002
kube-proxy-rqnkv 1/1 Running 0 86s 172.16.4.204 k8s-company01-worker001
kube-proxy-vvdrl 1/1 Running 0 7m35s 172.16.4.202 k8s-company01-master02
kube-proxy-wnctx 1/1 Running 0 6m44s 172.16.4.203 k8s-company01-master03
kube-scheduler-k8s-company01-master01 1/1 Running 1 8m 172.16.4.201 k8s-company01-master01
kube-scheduler-k8s-company01-master02 1/1 Running 0 7m34s 172.16.4.202 k8s-company01-master02
kube-scheduler-k8s-company01-master03 1/1 Running 0 5m37s 172.16.4.203 k8s-company01-master03
[root@k8s-company01-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-company01-master01 Ready master 9m51s v1.14.1
k8s-company01-master02 Ready master 8m15s v1.14.1
k8s-company01-master03 Ready master 7m24s v1.14.1
k8s-company01-worker001 Ready 2m6s v1.14.1
k8s-company01-worker002 Ready 108s v1.14.1
[root@k8s-company01-master01 ~]# kubectl get csr
NAME AGE REQUESTOR CONDITION
csr-94f5v 8m27s system:bootstrap:fp0x6g Approved,Issued
csr-g9tbg 2m19s system:bootstrap:fp0x6g Approved,Issued
csr-pqr6l 7m49s system:bootstrap:fp0x6g Approved,Issued
csr-vwtqq 2m system:bootstrap:fp0x6g Approved,Issued
csr-w486d 10m system:node:k8s-company01-master01 Approved,Issued
[root@k8s-company01-master01 ~]# kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
# 不安装的情况下:
[root@k8s-master03 ~]# kubectl top nodes
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
这里使用 helm 安装:
安装 helm(在 master01 上执行):
wget http://192.168.160.200/yum/scripts/k8s/helm-v2.13.1-linux-amd64.tar.gz
或 wget http://111.1.17.135/yum/scripts/k8s/helm-v2.13.1-linux-amd64.tar.gz
tar xvzf helm-v2.13.1-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
# 验证
helm help
每个节点执行
yum install -y socat
使用微软的源(阿里的源很长时间都没更新了!)
# helm init --client-only --stable-repo-url https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts/
# helm repo add incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
helm init --client-only --stable-repo-url http://mirror.azure.cn/kubernetes/charts/
helm repo add incubator http://mirror.azure.cn/kubernetes/charts-incubator/
helm repo update
# 在 Kubernetes 中安装 Tiller 服务,因为官方的镜像因为某些原因无法拉取,使用-i指定自己的镜像,可选镜像:registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.9.1(阿里云),该镜像的版本与helm客户端的版本相同,使用helm version可查看helm客户端版本。
helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1 --tiller-tls-cert /etc/kubernetes/ssl/tiller001.pem --tiller-tls-key /etc/kubernetes/ssl/tiller001-key.pem --tls-ca-cert /etc/kubernetes/ssl/ca.pem --tiller-namespace kube-system --stable-repo-url http://mirror.azure.cn/kubernetes/charts/ --service-account tiller --history-max 200
# 因为 Helm 的服务端 Tiller 是一个部署在 Kubernetes 中 Kube-System Namespace 下 的 Deployment,它会去连接 Kube-Api 在 Kubernetes 里创建和删除应用。
# 而从 Kubernetes 1.6 版本开始,API Server 启用了 RBAC 授权。目前的 Tiller 部署时默认没有定义授权的 ServiceAccount,这会导致访问 API Server 时被拒绝。所以我们需要明确为 Tiller 部署添加授权。
# 创建 Kubernetes 的服务帐号和绑定角色
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
# 查看是否授权成功
[root@k8s-company01-master01 ~]# kubectl -n kube-system get pods|grep tiller
tiller-deploy-7bf47568d4-42wf5 1/1 Running 0 17s
[root@k8s-company01-master01 ~]# helm version
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
[root@k8s-company01-master01 ~]# helm repo list
NAME URL
stable http://mirror.azure.cn/kubernetes/charts/
local http://127.0.0.1:8879/charts
incubator http://mirror.azure.cn/kubernetes/charts-incubator/
## 如果要替换仓库,先移除原先的仓库
#helm repo remove stable
## 添加新的仓库地址
#helm repo add stable http://mirror.azure.cn/kubernetes/charts/
#helm repo add incubator http://mirror.azure.cn/kubernetes/charts-incubator/
#helm repo update
# 创建 metrics-server-custom.yaml
cat >> metrics-server-custom.yaml <
# 为方便管理,创建一个单独的 Namespace monitoring,Prometheus Operator 相关的组件都会部署到这个 Namespace。
kubectl create namespace monitoring
## 自定义 prometheus-operator 参数
# helm fetch stable/prometheus-operator --version=5.0.3 --untar
# cat prometheus-operator/values.yaml | grep -v '#' | grep -v ^$ > prometheus-operator-custom.yaml
# 只保留我们要修改 image 的部分,还有使用 https 连接 etcd,例如:
参考:https://fengxsong.github.io/2018/05/30/Using-helm-to-manage-prometheus-operator/
cat >> prometheus-operator-custom.yaml << EOF
## prometheus-operator/values.yaml
alertmanager:
service:
nodePort: 30503
type: NodePort
alertmanagerSpec:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/alertmanager
tag: v0.16.1
prometheusOperator:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/coreos/prometheus-operator
tag: v0.29.0
pullPolicy: IfNotPresent
configmapReloadImage:
repository: reg01.sky-mobi.com/k8s/quay.io/coreos/configmap-reload
tag: v0.0.1
prometheusConfigReloaderImage:
repository: reg01.sky-mobi.com/k8s/quay.io/coreos/prometheus-config-reloader
tag: v0.29.0
hyperkubeImage:
repository: reg01.sky-mobi.com/k8s/k8s.gcr.io/hyperkube
tag: v1.12.1
pullPolicy: IfNotPresent
prometheus:
service:
nodePort: 30504
type: NodePort
prometheusSpec:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/prometheus
tag: v2.7.1
secrets: [etcd-client-cert]
kubeEtcd:
serviceMonitor:
scheme: https
insecureSkipVerify: false
serverName: ""
caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key
## prometheus-operator/charts/grafana/values.yaml
grafana:
service:
nodePort: 30505
type: NodePort
image:
repository: reg01.sky-mobi.com/k8s/grafana/grafana
tag: 6.0.2
sidecar:
image: reg01.sky-mobi.com/k8s/kiwigrid/k8s-sidecar:0.0.13
## prometheus-operator/charts/kube-state-metrics/values.yaml
kube-state-metrics:
image:
repository: reg01.sky-mobi.com/k8s/k8s.gcr.io/kube-state-metrics
tag: v1.5.0
## prometheus-operator/charts/prometheus-node-exporter/values.yaml
prometheus-node-exporter:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/node-exporter
tag: v0.17.0
EOF
## 注:以上的prometheus-operator/charts/grafana/values.yaml 对应项添加了 grafana (按chats 目录添加的:)
#[root@k8s-master01 ~]# ll prometheus-operator/charts/
#total 0
#drwxr-xr-x 4 root root 114 Apr 1 00:48 grafana
#drwxr-xr-x 3 root root 96 Apr 1 00:18 kube-state-metrics
#drwxr-xr-x 3 root root 110 Apr 1 00:20 prometheus-node-exporter
# 创建连接 etcd 的证书secret:
kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
helm install stable/prometheus-operator --version=5.0.3 --name=monitoring --namespace=monitoring -f prometheus-operator-custom.yaml
## 如果想要删除重来,可以使用 helm 删除,指定名字 monitoring
#helm del --purge monitoring
#kubectl delete crd prometheusrules.monitoring.coreos.com
#kubectl delete crd servicemonitors.monitoring.coreos.com
#kubectl delete crd alertmanagers.monitoring.coreos.com
重新安装 不要删除之前的,再安装可能会报错,用 upgrade 就好:
helm upgrade monitoring stable/prometheus-operator --version=5.0.3 --namespace=monitoring -f prometheus-operator-custom.yaml
[root@k8s-company01-master01 ~]# kubectl -n monitoring get pod
NAME READY STATUS RESTARTS AGE
alertmanager-monitoring-prometheus-oper-alertmanager-0 2/2 Running 0 29m
monitoring-grafana-7dd5cf9dd7-wx8mz 2/2 Running 0 29m
monitoring-kube-state-metrics-7d98487cfc-t6qqw 1/1 Running 0 29m
monitoring-prometheus-node-exporter-fnvp9 1/1 Running 0 29m
monitoring-prometheus-node-exporter-kczcq 1/1 Running 0 29m
monitoring-prometheus-node-exporter-m8kf6 1/1 Running 0 29m
monitoring-prometheus-node-exporter-mwc4g 1/1 Running 0 29m
monitoring-prometheus-node-exporter-wxmt8 1/1 Running 0 29m
monitoring-prometheus-oper-operator-7f96b488f6-2j7h5 1/1 Running 0 29m
prometheus-monitoring-prometheus-oper-prometheus-0 3/3 Running 1 28m
[root@k8s-company01-master01 ~]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None 9093/TCP,6783/TCP 31m
monitoring-grafana NodePort 10.109.159.105 80:30579/TCP 32m
monitoring-kube-state-metrics ClusterIP 10.100.31.235 8080/TCP 32m
monitoring-prometheus-node-exporter ClusterIP 10.109.119.13 9100/TCP 32m
monitoring-prometheus-oper-alertmanager NodePort 10.105.171.135 9093:31309/TCP 32m
monitoring-prometheus-oper-operator ClusterIP 10.98.135.170 8080/TCP 32m
monitoring-prometheus-oper-prometheus NodePort 10.96.15.36 9090:32489/TCP 32m
prometheus-operated ClusterIP None 9090/TCP 31m
# 查看有没有异常告警,alerts里面的第一个Watchdog 是正常的报警,用于监控功能探测。
http://172.16.4.200:32489/alerts
http://172.16.4.200:32489/targets
#以下是安装 kubernetes-dashboard,用处不大,正式环境暂时不装
#helm install --name=kubernetes-dashboard stable/kubernetes-dashboard --version=1.4.0 --namespace=kube-system --set image.repository=reg01.sky-mobi.com/k8s/k8s.gcr.io/kubernetes-dashboard-amd64,image.tag=v1.10.1,rbac.clusterAdminRole=true
#Heapter 已在 Kubernetes 1.13 版本中移除(https://github.com/kubernetes/heapster/blob/master/docs/deprecation.md),推荐使用 metrics-server 与 Prometheus。