OpenShift 4 之Istio-Tutorial (6) 服务恢复能力(重试、超时、断路器)

本系列OpenShift Servic Mesh教程是基于Red Hat官方公开发行的《Introducing Istio Service Mesh for Micoservices》出版物,我将所有操作在OpenShift 4.2.x环境中进行了验证。喜欢读英文或者需要了解更多和场景相关知识点的小伙伴可以通过上面的链接下载该书慢慢阅读。

作为Service Mesh架构,Istio为出问题的微服务提供了感知和恢复能力。主要体现在超时(Timeout)、重试(Try)、断路器(Circuit Breaker)和池拒绝(Pool Ejection)功能。

准备环境
在开始操作前,我们先把以前针对Recommendation定义的DestinationRule和VirtualService删除掉,然后把运行recommendation-v2微服务的Pod设为1个。

$ oc delete -f istiofiles/destination-rule-recommendation_lb_policy_app.yml -n tutorial
$ oc get istio-io -n tutorial
NAME                                             GATEWAYS        HOSTS   AGE
virtualservice.networking.istio.io/customer-vs   [customer-gw]   [*]     5h
NAME                                      AGE
gateway.networking.istio.io/customer-gw   5h
$ oc scale deployment recommendation-v2 --replicas=1 -n tutorial
$ oc get pod -n tutorial
customer-77dc47d7f8-szhd5            2/2     Running   4          6h
preference-v1-55476494cf-xm4dq       2/2     Running   0          3h
recommendation-v1-67976848-4l4s7     2/2     Running   0          3h
recommendation-v2-599867df6c-5ccdx   2/2     Running   0          19m

文章目录

  • 重试(Fail Try)
  • 超时(Timeout)
  • 断路器(Circuit Breaker)

重试(Fail Try)

当通过Service访问一个微服务的Pod出现错误(例如503)后,Istio可以自动(缺省配置,可以修改)尝试访问其它运行微服务的Pod。

  1. 在一个窗口中运行以下命令,可以看到此时recommendation v1和recommendation v2是交替被调用的。
$ export INGRESS_GATEWAY=$(oc get route istio-ingressgateway -n istio-system -o 'jsonpath={.spec.host}')
$ ./scripts/run.sh $INGRESS_GATEWAY/customer
customer => preference => recommendation v2 from '3cbba7a9cde5': 50
customer => preference => recommendation v1 from '67976848-4l4s7': 1451
customer => preference => recommendation v2 from '3cbba7a9cde5': 51
customer => preference => recommendation v1 from '67976848-4l4s7': 1452

此时在Kiali中可以看到recommendation v1和recommendation v2各有50%的调用机会。
OpenShift 4 之Istio-Tutorial (6) 服务恢复能力(重试、超时、断路器)_第1张图片
2. 在另一个窗口执行以下命令进入运行recommendation v2微服务的容器,以便能模拟recommendation v2微服务运行不正常。

$ oc exec -it $(oc get pods|grep recommendation-v2|awk '{ print $1 }'|head -1) -c recommendation /bin/bash
  1. 在运行recommendation v2微服务的容器内部执行以下命令,调用微服务接口模拟运行异常(返回503),然后退出容器。
bash-4.4$ curl localhost:8080/misbehave
Following requests to / will return a 503
bash-4.4$ exit
exit
  1. 在第一个窗口确认只调用recommendation-v1微服务了。
$ export INGRESS_GATEWAY=$(oc get route istio-ingressgateway -n istio-system -o 'jsonpath={.spec.host}')
$ ./scripts/run.sh $INGRESS_GATEWAY/customer
customer => preference => recommendation v1 from '67976848-4l4s7': 1444
customer => preference => recommendation v1 from '67976848-4l4s7': 1445
customer => preference => recommendation v1 from '67976848-4l4s7': 1446

此时在Kiali中可以看到recommendation v1有100%的调用机会。选中recommandation的方框,然后将鼠标放在右侧红色App:recommendation上,此时可以看到提示:虽然2个recommandation的Pod状态都正常,但是有一部分Inbound请求出错。
OpenShift 4 之Istio-Tutorial (6) 服务恢复能力(重试、超时、断路器)_第2张图片
选中recommandation v2的小方框,然后将鼠标放在右侧红色App:recommendation上,此时可以看到提示:100%的Inbound请求出错。
OpenShift 4 之Istio-Tutorial (6) 服务恢复能力(重试、超时、断路器)_第3张图片
选中preference的方框,然后将鼠标放在右侧红色App:preference上,此时可以看到提示:有一部分Outbound请求出错。
OpenShift 4 之Istio-Tutorial (6) 服务恢复能力(重试、超时、断路器)_第4张图片
5. 再恢复recommendation v2微服务正常运行,此时第一个窗口又可以访问到recommendation v2微服务了。

$ oc exec -it $(oc get pods|grep recommendation-v2|awk '{ print $1 }'|head -1) -c recommendation /bin/bash
bash-4.4$ curl localhost:8080/behave
Following requests to / will return a 503
bash-4.4$ exit
exit
  1. 文件istiofiles/virtual-service-recommendation-try.yml修改了对recommendation的Servic访问的缺省重试配置。
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
  - recommendation
  http:
  - route:
    - destination:
        host: recommendation
    retries:
      attempts: 3
      perTryTimeout: 2s

执行命令修改重试的配置。

$ oc apply -f istiofiles/virtual-service-recommendation-try.yml

超时(Timeout)

为访问微服务设置超时时间,当超过Timeout时间后自动结束访问。

  1. 首先我们根据本文开始说明,重新准备环境,确保recommendation v1和recommendation v2都可以正常访问。
  2. 执行以下命令,删掉正常的recommendation v2的Deployment,然后部署一个超时版的recommendation v2。
$ oc delete -f recommendation/kubernetes/Deployment-v2.yml
$ oc apply -f recommendation/kubernetes/Deployment-v2-timeout.yml
  1. 执行脚本连续访问customer,可以发现recommendation v2返回结果比较慢。
$ export INGRESS_GATEWAY=$(oc get route istio-ingressgateway -n istio-system -o 'jsonpath={.spec.host}')
$ ./scripts/run.sh $INGRESS_GATEWAY/customer
customer => preference => recommendation v1 from '67976848-4l4s7': 3078
customer => preference => recommendation v2 from '379afb614fb1': 1
customer => preference => recommendation v1 from '67976848-4l4s7': 3079
customer => preference => recommendation v2 from '379afb614fb1': 2
CTL+C
  1. 为recommendation服务创建一个VirtualService。文件istiofiles/virtual-service-recommendation-timeout.yml内容如下,其中为访问recommendation定义了timeout为1s。
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
  - recommendation
  http:
  - route:
    - destination:
        host: recommendation
    timeout: 1.000s

执行命令,创建VirtualService

$ oc apply -f istiofiles/virtual-service-recommendation-timeout.yml -n tutorial
  1. 执行脚本持续访问customer,可以看到由于recommendation v2超时,所以返回的只有recommendation v1。
$ export INGRESS_GATEWAY=$(oc get route istio-ingressgateway -n istio-system -o 'jsonpath={.spec.host}')
$ ./scripts/run.sh $INGRESS_GATEWAY/customer
customer => preference => recommendation v1 from '67976848-4l4s7': 3084
customer => preference => recommendation v1 from '67976848-4l4s7': 3085
customer => preference => recommendation v1 from '67976848-4l4s7': 3086
  1. 在Kiali控制台中,选中recommendation的红色方框,鼠标点到右上方App:recommendation后,可以看到50%的Inbound出现错误。由于此步骤没有配置“retry”
    OpenShift 4 之Istio-Tutorial (6) 服务恢复能力(重试、超时、断路器)_第5张图片
  2. 最后恢复环境,删除超时的recommendation v2部署和recommendation的VirtualService,并重新部署正常的recommendation v2。
$ oc delete -f istiofiles/virtual-service-recommendation-timeout.yml -n tutorial
$ oc delete -f recommendation/kubernetes/Deployment-v2-timeout.yml
$ oc apply -f recommendation/kubernetes/Deployment-v2.yml

断路器(Circuit Breaker)

当一个被调用服务出现错误后,下一次Istio还会将请求发给出错的服务。回顾本文的“重试(Fail Try)”中的第二章截图,访问recommendation v2的失败率是33%。这是由于Istio会轮训将请求发给recommendation v2和recommendation v2,当发现recommendation v2出现错误后再尝试发给recommendation v1。我们可以在DestinationRule上设置断路器(Circuit Breaker),以便在访问某个服务失败后暂时断开对这个服务实例的访问。Istio会在指定的时间后尝试将请求再次发给recommendation v2,如果此时失效的recommendation v2已经恢复正常,Istio会停止作用于recommendation v2的断路器。

  1. 根据本文“准备环境”的说明,删除所有recommendation微服务相关的VirtualService、DestinationRule。此时请求可以被轮训发到recommendation v1和recommendation v2上。
  2. 进入运行recommendation v2的容器里,执行命令把状态设为disbehave状态,然后退出。
$ oc exec -it -n tutorial $(oc get pods -n tutorial|grep recommendation-v2|awk '{ print $1 }'|head -1) -c recommendation /bin/bash
bash-4.4$ curl localhost:8080/misbehave
Following requests to / will return a 503
bash-4.4$ exit
exit
  1. 执行命令,连续访问customer。此时从调用客户端看到的是recommendation v1响应处理的请求。
$ export INGRESS_GATEWAY=$(oc get route istio-ingressgateway -n istio-system -o 'jsonpath={.spec.host}')
$ ./scripts/run.sh $INGRESS_GATEWAY/customer
customer => preference => recommendation v1 from '67976848-4l4s7': 3492
customer => preference => recommendation v1 from '67976848-4l4s7': 3493
customer => preference => recommendation v1 from '67976848-4l4s7': 3494
customer => preference => recommendation v1 from '67976848-4l4s7': 3495
...
  1. 在另一个窗口执行以下命令,查看运行recommendation v2的容器日志。我们可以发现recommendation v2微服务还继续不断有日志输出,说明有请求还是不断发给它。因为此时我们还没有为recommendation v2设置断路器。
$ oc logs -f $(oc get pods |grep recommendation-v2|awk '{ print $1 }'|head -1) -c recommendation
recommendation request from 3cbba7a9cde5: 11
recommendation request from 3cbba7a9cde5: 12
recommendation request from 3cbba7a9cde5: 13
...
  1. 在第三个窗口执行命令,为recommendation服务创建带有断路器的DestinationRule。文件istiofiles/destination-rule-recommendation_cb_policy_version_v2.yml定义了DestinationRule:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: recommendation
spec:
  host: recommendation
  subsets:
  - labels:
      version: v1
    name: version-v1
  - labels:
      version: v2
    name: version-v2
  trafficPolicy:
    connectionPool:
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
      tcp:
        maxConnections: 1
    outlierDetection:
      baseEjectionTime: 3m
      consecutiveErrors: 1
      interval: 1s
      maxEjectionPercent: 100

执行命令为recommendation服务创建DestinationRule。

$ oc apply -f istiofiles/destination-rule-recommendation_cb_policy_version_v2.yml -n tutorial
  1. 此时可从从窗口1看到还不断发送对customer的请求,但是窗口2已经不再有日志输出,说明请求没有被发到recommendation
    v2微服务上,这是由于断路器暂时不再将请求发给出问题的recommendation v2微服务。
  2. 再次进入运行recommendation v2的容器里,执行命令把状态设为behave状态,然后退出。这样recommendation v2微服务又恢复了正常访问。
$ oc exec -it -n tutorial $(oc get pods -n tutorial|grep recommendation-v2|awk '{ print $1 }'|head -1) -c recommendation /bin/bash
bash-4.4$ curl localhost:8080/behave
Following requests to / will return a 503
bash-4.4$ exit
exit
  1. 继续观察窗口1和窗口2的输出,需要等3分钟左右。从以下窗口1的日志可以看到请求再次被转发到recommendation v2,而窗口2也会有新的日志输出,说明recommendation v2已经恢复接收到转发的请求。
customer => preference => recommendation v1 from '67976848-4l4s7': 4522
customer => preference => recommendation v2 from '3cbba7a9cde5': 20
customer => preference => recommendation v1 from '67976848-4l4s7': 4523
customer => preference => recommendation v2 from '3cbba7a9cde5': 21
customer => preference => recommendation v1 from '67976848-4l4s7': 4524
customer => preference => recommendation v2 from '3cbba7a9cde5': 22
customer => preference => recommendation v1 from '67976848-4l4s7': 4525
customer => preference => recommendation v2 from '3cbba7a9cde5': 23

你可能感兴趣的:(OpenShift,4,ServiceMesh,微服务)