1.指定请求到reviews服务为v2版本,这样再请求页面只会看到黑色的星星
kubectl apply -f - <
2.给ratings服务注入一个2s的延迟,这样在刷新/productpage页面时会等待两秒才有内容
kubectl apply -f - <
3.给reviews服务增加一个0.5s的超时,这样再刷新页面就提示服务不可用了
kubectl apply -f - <
1.取消reviews服务的超时配置
kubectl apply -f - <
2.给ratings服务添加一个5s的延迟,同时配置1s的超时重试
kubectl apply -f - <
3.打开ratings服务sidecar的log来看看请求,发现有两次请求,这样就证明重试配置起作用了
kubectl logs -f ratings-v1-5d4f4b45bf-5sjw4 -c istio-proxy
[2020-09-07T13:20:14.753Z] "GET /ratings/0 HTTP/1.1" 200 - "-" "-" 0 48 4 2 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "43419be2-8404-9410-b30c-451860d84689" "ratings:9080" "127.0.0.1:9080" inbound|9080|http|ratings.default.svc.cluster.local 127.0.0.1:47484 10.244.1.42:9080 10.244.1.39:40632 outbound_.9080_.v1_.ratings.default.svc.cluster.local default
[2020-09-07T13:20:17.781Z] "GET /ratings/0 HTTP/1.1" 200 - "-" "-" 0 48 2 2 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15" "43419be2-8404-9410-b30c-451860d84689" "ratings:9080" "127.0.0.1:9080" inbound|9080|http|ratings.default.svc.cluster.local 127.0.0.1:47484 10.244.1.42:9080 10.244.1.39:40634 outbound_.9080_.v1_.ratings.default.svc.cluster.local default
istio能针对连接、请求和异常检测去配置熔断,它是保障微服务弹性的重要手段之一,能够保证微服务级联雪崩,导致整个系统崩溃
为了展示熔断的特性,需要
1.部署一个httpbin服务
root@kube1:~/istio/istio-1.5.1# kubectl apply -f samples/httpbin/httpbin.yaml
serviceaccount/httpbin created
service/httpbin created
deployment.apps/httpbin created
2.为这个服务添加熔断配置
kubectl apply -f - <
root@kube1:~/istio/istio-1.5.1# kubectl describe dr httpbin
Name: httpbin
Namespace: default
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"networking.istio.io/v1alpha3","kind":"DestinationRule","metadata":{"annotations":{},"name":"httpbin","namespace":"default"}...
API Version: networking.istio.io/v1beta1
Kind: DestinationRule
Metadata:
Creation Timestamp: 2020-09-07T13:45:57Z
Generation: 1
Resource Version: 4347907
Self Link: /apis/networking.istio.io/v1beta1/namespaces/default/destinationrules/httpbin
UID: 1c270ce5-be8e-437a-81ba-77797db148ea
Spec:
Host: httpbin
Traffic Policy:
Connection Pool:
Http:
http1MaxPendingRequests: 1
Max Requests Per Connection: 1
Tcp:
Max Connections: 1
Outlier Detection:
Base Ejection Time: 3m
Consecutive Errors: 1
Interval: 1s
Max Ejection Percent: 100
Events:
4.部署一个流量测试工具fortio
root@kube1:~/istio/istio-1.5.1# kubectl apply -f samples/httpbin/sample-client/fortio-deploy.yaml
service/fortio created
deployment.apps/fortio-deploy created
5.登录pod,使用fortio工具 来调用httpbin 触发一次请求,可以看到响应结果
root@kube1:~/istio/istio-1.5.1# export FORTIO_POD=$(kubectl get pods -lapp=fortio -o 'jsonpath={.items[0].metadata.name}')
root@kube1:~/istio/istio-1.5.1# kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio curl -quiet http://httpbin:8000/g
HTTP/1.1 200 OK
server: envoy
date: Mon, 07 Sep 2020 14:02:24 GMT
content-type: application/json
content-length: 586
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 17
{
"args": {},
"headers": {
"Content-Length": "0",
"Host": "httpbin:8000",
"User-Agent": "fortio.org/fortio-1.6.8",
"X-B3-Parentspanid": "d964f49dc1381ad1",
"X-B3-Sampled": "1",
"X-B3-Spanid": "f251aa8ed27b094a",
"X-B3-Traceid": "684683095322c746d964f49dc1381ad1",
"X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/default/sa/httpbin;Hash=01cc58646a677f556d03ba08ae4bb4bf77784d1e6874e46d4965b4ef9a52a5da;Subject=\"\";URI=spiffe://cluster.local/ns/default/sa/default"
},
"origin": "127.0.0.1",
"url": "http://httpbin:8000/get"
}
6.因为之前的熔断配置是maxConnections: 1 和 http1MaxPendingRequests: 1,这些配置意味着如果连接和请求同时超过一个,就会发生熔断错误,所以下面的命令同时使用2个连接(-c 2) 和发送20个请求(-n 20)
kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 -loglevel Warning http://httpbin:8000/get
14:05:39 I logger.go:115> Log level is now 3 Warning (was 2 Info)
Fortio 1.6.8 running at 0 queries per second, 8->8 procs, for 20 calls: http://httpbin:8000/get
Starting at max qps with 2 thread(s) [gomax 8] for exactly 20 calls (10 per thread + 0)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:05:39 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
Ended after 811.898072ms : 20 calls. qps=24.634
Aggregated Function Time : count 20 avg 0.057826079 +/- 0.06969 min 0.000387521 max 0.221833788 sum 1.15652158
# range, mid point, percentile, count
>= 0.000387521 <= 0.001 , 0.000693761 , 25.00, 5
> 0.002 <= 0.003 , 0.0025 , 30.00, 1
> 0.003 <= 0.004 , 0.0035 , 35.00, 1
> 0.008 <= 0.009 , 0.0085 , 40.00, 1
> 0.016 <= 0.018 , 0.017 , 45.00, 1
> 0.025 <= 0.03 , 0.0275 , 55.00, 2
> 0.03 <= 0.035 , 0.0325 , 60.00, 1
> 0.045 <= 0.05 , 0.0475 , 70.00, 2
> 0.09 <= 0.1 , 0.095 , 75.00, 1
> 0.12 <= 0.14 , 0.13 , 80.00, 1
> 0.14 <= 0.16 , 0.15 , 85.00, 1
> 0.16 <= 0.18 , 0.17 , 95.00, 2
> 0.2 <= 0.221834 , 0.210917 , 100.00, 1
# target 50% 0.0275
# target 75% 0.1
# target 90% 0.17
# target 99% 0.217467
# target 99.9% 0.221397
Sockets used: 9 (for perfect keepalive, would be 2)
Jitter: false
Code 200 : 12 (60.0 %)
Code 503 : 8 (40.0 %)
Response Header Sizes : count 20 avg 138.85 +/- 113.4 min 0 max 232 sum 2777
Response Body/Total Sizes : count 20 avg 586.85 +/- 282.4 min 241 max 818 sum 11737
All done 20 calls (plus 0 warmup) 57.826 ms avg, 24.6 qps
上面的统计结果显示60%的请求通过了,40%的请求返回503被熔断了
Code 200 : 12 (60.0 %)
Code 503 : 8 (40.0 %)
7.把并发连接数提到3进行测试
kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 3 -qps 0 -n 30 -loglevel Warning http://httpbin:8000/get
14:07:32 I logger.go:115> Log level is now 3 Warning (was 2 Info)
Fortio 1.6.8 running at 0 queries per second, 8->8 procs, for 30 calls: http://httpbin:8000/get
Starting at max qps with 3 thread(s) [gomax 8] for exactly 30 calls (10 per thread + 0)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
14:07:32 W http_client.go:698> Parsed non ok code 503 (HTTP/1.1 503)
Ended after 28.717189ms : 30 calls. qps=1044.7
Aggregated Function Time : count 30 avg 0.0024530259 +/- 0.005584 min 0.00031694 max 0.022741939 sum 0.073590778
# range, mid point, percentile, count
>= 0.00031694 <= 0.001 , 0.00065847 , 86.67, 26
> 0.001 <= 0.002 , 0.0015 , 90.00, 1
> 0.012 <= 0.014 , 0.013 , 93.33, 1
> 0.02 <= 0.0227419 , 0.021371 , 100.00, 2
# target 50% 0.000699454
# target 75% 0.000904372
# target 90% 0.002
# target 99% 0.0223306
# target 99.9% 0.0227008
Sockets used: 28 (for perfect keepalive, would be 3)
Jitter: false
Code 200 : 2 (6.7 %)
Code 503 : 28 (93.3 %)
Response Header Sizes : count 30 avg 15.4 +/- 57.62 min 0 max 231 sum 462
Response Body/Total Sizes : count 30 avg 210.13333 +/- 165.5 min 153 max 817 sum 6304
All done 30 calls (plus 0 warmup) 2.453 ms avg, 1044.7 qps
成功率进一步降低,失败率进一步升高
Code 200 : 2 (6.7 %)
Code 503 : 28 (93.3 %)
9.通过下面的命令可以看一下被熔断的具体情况的聚合数据,httpbin.default.svc.cluster.local.upstream_rq_pending_overflow 展示的就是具体被熔断的次数是11
kubectl exec "$FORTIO_POD" -c istio-proxy -- pilot-agent request GET stats | grep httpbin | grep pending
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.default.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.high.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_active: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_failure_eject: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_overflow: 11
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_total: 16
1.首先需要恢复服务的正常路由配置,然后添加路由配置,让登录名为jason的用户访问v2版本的reviews服务
,让其他用户访问v1版本的reviews服务
kubectl apply -f samples/bookinfo/networking/virtual-service-all-v1.yaml
kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml
2.为v2版本reviews服务注入一个100%的流量返回500错误的abort故障
kubectl apply -f samples/bookinfo/networking/virtual-service-ratings-test-abort.yaml
3.然后用jason用户登录,刷新页面就可以看到Ratings service is currently unavailable的提示信息
总结一下,可以注入的故障有两种超时(delay)和终止(abort),他们的配置如下:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
...
spec:
hosts:
- ratings
http:
- fault:
delay:
fixedDelay: 7s
percentage:
value: 50
- fault:
abort:
httpStatus: 500
percentage:
value: 50
match:
- headers:
end-user:
exact: jason
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
1.首先需要部署v1、v2两个版本的httpbin服务
cat <
cat <
2.为这两个服务创建可访问的service
kubectl create -f - <
3.为httpbin定义路由,将其指向v1版本
kubectl apply -f - <
4.发一个请求测试一下,可以看到header信息,说明请求成功了
root@kube1:~/istio/istio-1.5.1# export SLEEP_POD=$(kubectl get pod -l app=sleep -o jsonpath={.items..metadata.name})
root@kube1:~/istio/istio-1.5.1# kubectl exec "${SLEEP_POD}" -c sleep -- curl -s http://httpbin:8000/headers
{
"headers": {
"Accept": "*/*",
"Content-Length": "0",
"Host": "httpbin:8000",
"User-Agent": "curl/7.69.1",
"X-B3-Parentspanid": "917d1f30c9d691cd",
"X-B3-Sampled": "1",
"X-B3-Spanid": "1a7987c2578d6687",
"X-B3-Traceid": "107cc7e533b09efb917d1f30c9d691cd",
"X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/default/sa/default;Hash=4c1074a241f191c1d6732cccb583a46dcfbf882813693a6211ec5b417c59d865;Subject=\"\";URI=spiffe://cluster.local/ns/default/sa/sleep"
}
}
5.为了观测后面v2会产生镜像流量,我们打开两个终端分别观看v1和v2产生的请求log
root@kube1:~/istio/istio-1.5.1# kubectl logs -f httpbin-v1-d879b9568-qtffc -c httpbin
[2020-09-08 01:52:10 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2020-09-08 01:52:10 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
[2020-09-08 01:52:10 +0000] [1] [INFO] Using worker: sync
[2020-09-08 01:52:10 +0000] [8] [INFO] Booting worker with pid: 8
127.0.0.1 - - [08/Sep/2020:02:05:08 +0000] "GET /headers HTTP/1.1" 200 516 "-" "curl/7.69.1"
root@kube1:~/istio/istio-1.5.1# kubectl logs -f httpbin-v2-69bcdd6f7c-c78ld -c httpbin
[2020-09-08 01:52:16 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2020-09-08 01:52:16 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
[2020-09-08 01:52:16 +0000] [1] [INFO] Using worker: sync
[2020-09-08 01:52:16 +0000] [9] [INFO] Booting worker with pid: 9
6.为httpbin-v2 添加流量镜像的配置
kubectl apply -f - <
7.为v1发送一个请求
root@kube1:~# kubectl exec sleep-6bdb595bcb-xktfl -c sleep -- curl -s http://httpbin:8000/headers
{
"headers": {
"Accept": "*/*",
"Content-Length": "0",
"Host": "httpbin:8000",
"User-Agent": "curl/7.69.1",
"X-B3-Parentspanid": "cbe3361fec2e1656",
"X-B3-Sampled": "1",
"X-B3-Spanid": "d8ae3fbe573ce08d",
"X-B3-Traceid": "c7b6073fc8111651cbe3361fec2e1656",
"X-Forwarded-Client-Cert": "By=spiffe://cluster.local/ns/default/sa/httpbin;Hash=4c1074a241f191c1d6732cccb583a46dcfbf882813693a6211ec5b417c59d865;Subject=\"\";URI=spiffe://cluster.local/ns/default/sa/sleep"
}
}
然后在v1和v2的终端里就会看到对应的log
root@kube1:~# kubectl logs -f httpbin-v1-d879b9568-qtffc -c httpbin
[2020-09-08 01:52:10 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2020-09-08 01:52:10 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
[2020-09-08 01:52:10 +0000] [1] [INFO] Using worker: sync
[2020-09-08 01:52:10 +0000] [8] [INFO] Booting worker with pid: 8
127.0.0.1 - - [08/Sep/2020:02:22:47 +0000] "GET /headers HTTP/1.1" 200 516 "-" "curl/7.69.1"
root@kube1:~/istio/istio-1.5.1# kubectl logs -f httpbin-v2-69bcdd6f7c-c78ld -c httpbin
[2020-09-08 01:52:16 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2020-09-08 01:52:16 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
[2020-09-08 01:52:16 +0000] [1] [INFO] Using worker: sync
[2020-09-08 01:52:16 +0000] [9] [INFO] Booting worker with pid: 9
127.0.0.1 - - [08/Sep/2020:02:22:47 +0000] "GET /headers HTTP/1.1" 200 556 "-" "curl/7.69.1"