目录
istio-tools
下载安装包
修改官方提供的脚本setup_istio.sh
创建证书
生成证书
创建证书
证书集成到prometheus
prometheus没有收集到istio指标
查看ns的label
发现istio-injection被禁用了,因此启用istio-injection
prometheus-operator 集成alertmanager,接管告警规则
关于envoy proxy
一些告警表达式
https://github.com/istio/tools/tree/master/perf/istio-install
官方给出的文档是先在GCP上安装集群,然后安装istio,以及prometheus-operator。
直接在已有集群上执行这个操作。
如果使用官方setup_istio.sh里面的内容去下载的话会很慢,里面是从storage.googleapis.com下载。
先下载istio安装包:istio-1.6.1-linux-amd64.tar.gz
// 默认下载最新版本的istio
curl -L https://istio.io/downloadIstio | sh -
// 下载1.6.1版本的istio
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.6.1 sh -
脚本里面的GO111MODULE=on默认就有的,没有删掉,可以试试删掉会不会有影响。
脚本里面的DNS_DOMAIN="istio-test.local", 是自己添加的,如果没有这个值的话,后面istio-gateway.yaml安装会报错。
脚本会安装istio相关组件,以及prometheus-operator。
我把prometheus-operator安装在了istio-system这个namespace下了。
#!/usr/bin/bash
# Copyright Istio Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -ex
WD=$PWD
DIRNAME="${WD}"
mkdir -p "${DIRNAME}"
export GO111MODULE=on
export DNS_DOMAIN="istio-test.local"
# 解压istio到当前目录
tar xvf istio-1.6.1-linux-amd64.tar.gz
function install_istioctl() {
istioctl manifest apply --skip-confirmation -d ./istio-1.6.1/manifests
}
function install_extras() {
local domain=${DNS_DOMAIN:-"DNS_DOMAIN like v104.qualistio.org"}
kubectl create namespace istio-system|| true
# Deploy the gateways and prometheus operator.
# We install the prometheus operator first, then deploy the CR, to wait for the CRDs to get created
helm template --set domain="${domain}" --set prometheus.deploy=false "${WD}/base" | kubectl apply -f -
# Check CRD
CMDs_ARR=('kubectl get crds/prometheuses.monitoring.coreos.com' 'kubectl get crds/alertmanagers.monitoring.coreos.com'
'kubectl get crds/podmonitors.monitoring.coreos.com' 'kubectl get crds/prometheusrules.monitoring.coreos.com'
'kubectl get crds/servicemonitors.monitoring.coreos.com')
for CMD in "${CMDs_ARR[@]}"
do
MAXRETRIES=0
until $CMD || [ $MAXRETRIES -eq 60 ]
do
MAXRETRIES=$((MAXRETRIES + 1))
sleep 5
done
if [[ $MAXRETRIES -eq 60 ]]; then
echo "crds were not created successfully"
exit 1
fi
done
# Redeploy, this time with the Prometheus resource created
helm template --set domain="${domain}" "${WD}/base" | kubectl apply -f -
# Also deploy relevant ServiceMonitors
"istioctl" manifest generate --set profile=empty --set addonComponents.prometheusOperator.enabled=true -d ./istio-1.6.1/manifests | kubectl apply -f -
}
#download_release
install_istioctl "${DIRNAME}/istio-1.6.1"
if [[ -z "${SKIP_EXTRAS:-}" ]]; then
install_extras
fi
进入之前解压好的istio1.6.1目录
NAME固定为istio.prometheus,因为prometheus需要这个名称的secret
NAMESPACE为当前的istio-system
make -f ./istio-1.6.1/tools/certs/Makefile NAME="istio.prometheus" NAMESPACE="istio-system" "prometheus"-certs-wl
参考:https://istio.io/latest/docs/tasks/security/cert-management/plugin-ca-cert/
kubectl create secret generic istio.prometheus -n istio-system \
--from-file=prometheus/ca-cert.pem \
--from-file=prometheus/ca-key.pem \
--from-file=prometheus/root-cert.pem \
--from-file=prometheus/cert-chain.pem \
--from-file=prometheus/key.pem \
--from-file=prometheus/workload-cert-chain.pem
prometheus日志报错:
level=error ts=2020-07-08T21:20:20.023Z caller=manager.go:188 component="scrape manager" msg="error creating new scrape pool" err="error creating HTTP client: unable to use specified client cert (/etc/prometheus/secrets/istio.prometheus/cert-chain.pem) & key (/etc/prometheus/secrets/istio.prometheus/key.pem): tls: private key does not match public key" scrape_pool=istio-system/kubernetes-services-secure-monitor/0
原因:
缺少了workload-cert-chain.pem
集成完成后,发现并没有收集到istio的相关指标。
k get ns -A --show-labels
kubectl label ns istio-system istio-injection=enabled --overwrite
启用完成后,即可看到istio的指标已经收集到prometheus了。
1. 找到 https://github.com/istio/tools/blob/master/perf/istio-install/base/templates/prometheus-install.yaml
2. 修改prometheus-install.yaml, 集成alertmanager
````
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: istio-system
port: web
...
3. 添加ruleSelector
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
https://blog.getambassador.io/understanding-envoy-proxy-and-ambassador-http-access-logs-fee7802a2ec5
https://banzaicloud.com/blog/istio-telemetry/
结合实际情况替换:namespace, service, response_code, response_flags, reporter,span
rule:
// 请求量
request: sum(rate(istio_requests_total{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, destination_app, prometheus_replica) {{.operation}} {{.threshold}}
// 平均响应延迟
latency-avg: (avg(rate(istio_request_duration_milliseconds_sum{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m]) by (namespace, destination_app, prometheus_replica) )/(avg(rate(istio_request_duration_milliseconds_count{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, destination_app,prometheus_replica)) {{.operation}} {{.threshold}}
// 50%响应延迟
latency-50: histogram_quantile(0.5, sum(rate(istio_request_duration_milliseconds_bucket{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, le, prometheus_replica)) {{.operation}} {{.threshold}}
// 90%响应延迟
latency-90: histogram_quantile(0.9, sum(rate(istio_request_duration_milliseconds_bucket{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, le, prometheus_replica)) {{.operation}} {{.threshold}}
// 99%延迟
latency-99: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, le, prometheus_replica)) {{.operation}} {{.threshold}}
// 响应错误比例(4xx,5xx)
response-code: sum(rate(istio_request_duration_milliseconds_sum{namespace = "{{.namespace}}", destination_app = "{{.svcName}}", response_code=~"^40.*|^50.*"}[{{.span}}m])) by (namespace, destination_app, prometheus_replica) / sum(rate(istio_request_duration_milliseconds_count{namespace = "{{.namespace}}", destination_app = "{{.svcName}}"}[{{.span}}m])) by (namespace, destination_app, prometheus_replica) {{.operation}} {{.threshold}}