关于Consul(https://www.consul.io)是一个分布式,高可用,支持多数据中心的服务发现和配置共享的服务软件,由 HashiCorp 公司用 Go 语言开发, 基于 Mozilla Public License 2.0 的协议进行开源。 在Consul的文档上,Consul 支持Service Discovery, Health Checking, Key/Value Store, Multi DataCenter。运用Consul,可以在系统中build复杂的应用和服务的发现等。本文不是Consul的学习重点,关于更多Consul的学习,可参考:http://blog.csdn.net/column/details/consul.html
阅读本博客的前提是对Consul中数据中心,节点,服务,健康检查等名词,一些基本的consul命令,Consul UI的使用有一定的了解。
Ocelot对Consul支持是天生集成,在OcelotGateway项目中configuration.json配置就可以开启consul+ocelot的使用,这套组合可以实现什么功能呢?
服务注册,服务发现,API网关(ocelot固有,不作说明),负载均衡,限流,熔错告警,弹性和瞬态故障处理
一、服务治理:服务注册,服务发现
服务注册和服务发现都是Consul自有的功能,可以通过Consul的http api完成注册或发现,我写了个NuGet库ConsulSharp(https://github.com/axzxs2001/ConsulSharp),可以在.net core下完成服务的注册和发现,建议服务注册做到一个统一的管理平台里,为了测试方便,可以在Consul的配置文里先配置服务,每次Consul启动时,自动注册;关于服务发现,Ocelot自动可以完成,只需要在OcelotGateway项目中configuration.json进行配置就可以,接下来我们看看怎么配置。
首先下载Consul: https://www.consul.io/downloads.html,本项目是windows下进行测试
再下载Consul配置文件(这个配置文件是适合本例Demo的,可根据具体给你个况调整)https://github.com/axzxs2001/Asp.NetCoreExperiment/tree/master/Asp.NetCoreExperiment/ConsulOcelot/consul
conf文件夹:consul存放配置文件
dist文件夹:consul UI,一个小的consul信息展示门户
data文件夹:consul启动后存放consul的生成的数据和文件,在运行consul前可以清空此文件夹
Consul的配置文件如下:
{ "encrypt": "7TnJPB4lKtjEcCWWjN6jSA==", "services": [ { "id": "API001", "name": "API001", "tags": [ "API001" ], "address": "192.168.1.99", "port": 5001, "checks": [ { "id": "API001_Check", "name": "API001_Check", "http": "http://192.168.1.99:5001/health", "interval": "10s", "tls_skip_verify": false, "method": "GET", "timeout": "1s" } ] }, { "id": "API002", "name": "API002", "tags": [ "API002" ], "address": "192.168.1.99", "port": 5002, "checks": [ { "id": "API002_Check", "name": "API002_Check", "http": "http://192.168.1.99:5002/health", "interval": "10s", "tls_skip_verify": false, "method": "GET", "timeout": "1s" } ] } ] }
两个服务API001和API002,跟着两个健康检查API001_Check和API002_Check
基于consul服务的配置,现在创建三个asp.net core web api项目
OcelotGateway,网关项目,端口5000;API001业务API项目,端口5001;业务API项目,API002端口5002,代码参见https://github.com/axzxs2001/Asp.NetCoreExperiment/tree/master/Asp.NetCoreExperiment/ConsulOcelot
OcelotGateway实现引入Ocelot网关,API001,API002实现健康检查的两个get请求。
测试服务注册和发现:
1、 启动consul
consul agent -server -datacenter=dc1 -bootstrap -data-dir ./data -config-file ./conf -ui-dir ./dist -node=n1 -bind 本机IP -client=0.0.0.0
再启动一个consul,查看状态,命令:consul operator raft list-peers
结果:
Node ID Address State Voter RaftProtocol
n1 dad74de2-173d-1c1e-add0-975a243b59eb 192.168.1.99:8300 leader true 3
用Consul UI查看
services:
nodes:
可以看到API001和API002服务,并且健康检查都是正常的。
2、 配置Ocelot网关
configuration.json文件如下(关于ocelot配置文件,详见http://ocelot.readthedocs.io/en/latest/features/configuration.html):
{ "ReRoutes": [ { "DownstreamPathTemplate": "/api/values", "DownstreamScheme": "http", "DownstreamHostAndPorts": [ { "Host": "localhost", "Port": 5001 } ], "UpstreamPathTemplate": "/api001/values", "UpstreamHttpMethod": [ "Get" ], "ServiceName": "API001", "LoadBalancer": "RoundRobin", "UseServiceDiscovery": true, "ReRouteIsCaseSensitive": false, "QoSOptions": { "ExceptionsAllowedBeforeBreaking": 3, "DurationOfBreak": 10, "TimeoutValue": 5000 }, "HttpHandlerOptions": { "AllowAutoRedirect": false, "UseCookieContainer": false, "UseTracing": false }, "AuthenticationOptions": { "AuthenticationProviderKey": "", "AllowedScopes": [] }, "RateLimitOptions": { "ClientWhitelist": [ "admin" ], "EnableRateLimiting": true, "Period": "1m", "PeriodTimespan": 15, "Limit": 5 } }, { "DownstreamPathTemplate": "/notice", "DownstreamScheme": "http", "DownstreamHostAndPorts": [ { "Host": "localhost", "Port": 5001 } ], "UpstreamPathTemplate": "/notice", "UpstreamHttpMethod": [ "Post" ], "ReRouteIsCaseSensitive": false, "QoSOptions": { "ExceptionsAllowedBeforeBreaking": 3, "DurationOfBreak": 10, "TimeoutValue": 5000 }, "HttpHandlerOptions": { "AllowAutoRedirect": false, "UseCookieContainer": false, "UseTracing": false }, "AuthenticationOptions": { "AuthenticationProviderKey": "", "AllowedScopes": [] } }, { "DownstreamPathTemplate": "/api/values", "DownstreamScheme": "http", "DownstreamHostAndPorts": [ { "Host": "localhost", "Port": 5002 } ], "UpstreamPathTemplate": "/API002/values", "UpstreamHttpMethod": [ "Get" ], "QoSOptions": { "ExceptionsAllowedBeforeBreaking": 3, "DurationOfBreak": 10, "TimeoutValue": 5000 }, "ServiceName": "API002", "LoadBalancer": "RoundRobin", "UseServiceDiscovery": true, "HttpHandlerOptions": { "AllowAutoRedirect": false, "UseCookieContainer": false }, "AuthenticationOptions": { "AuthenticationProviderKey": "", "AllowedScopes": [] }, "RateLimitOptions": { "ClientWhitelist": [ "user" ], "EnableRateLimiting": true, "Period": "1m", "PeriodTimespan": 15, "Limit": 5 } } ], "GlobalConfiguration": { "ServiceDiscoveryProvider": { "Host": "localhost", "Port": 8500 }, "RateLimitOptions": { "ClientIdHeader": "client_id", "QuotaExceededMessage": "Too Many Requests!!!", "DisableRateLimitHeaders": false } } }
启动OcelotGateway,API001,API002项目,通过http://localhost:5000/api001/values,和http://localhost:5000/api002/values访问;因为Ocelot配置了Consul的服务治理,所以可以通过配置的服务名称和GlobalConfiguratin的Consul http api接口查找到对应服务的地址,进行访问,这些都是Ocelot帮我们做,这点很容易证明,可以修改Consul配置文件中服务的address为错误IP,就会发现通过5000端口访问不成功。
二、负载均衡
负载均衡需要启动多个API001和API002,才能进行测试,所以发布API001和API002项目,并复制到一个与192.168.1.99在一个局域网的电脑中,同时把Consul和它的配置,UI文件也复制到这台电脑上,网关项目OcelotGateway不需要,假设另外一台电脑为192.168.1.126
首先修改Consul的配置文件如下
{ "encrypt": "7TnJPB4lKtjEcCWWjN6jSA==", "services": [ { "id": "API001", "name": "API001", "tags": [ "API001" ], "address": "192.168.1.126", "port": 5001, "checks": [ { "id": "API001_Check", "name": "API001_Check", "http": "http://192.168.1.126:5001/health", "interval": "10s", "tls_skip_verify": false, "method": "GET", "timeout": "1s" } ] }, { "id": "API002", "name": "API002", "tags": [ "API002" ], "address": "192.168.1.126", "port": 5002, "checks": [ { "id": "API002_Check", "name": "API002_Check", "http": "http://192.168.1.126:5002/health", "interval": "10s", "tls_skip_verify": false, "method": "GET", "timeout": "1s" } ] } ] }
在192.168.1.126下启动API001,API002项目
启动consul
consul agent -server -datacenter=dc1 -data-dir ./data -config-file ./conf -ui-dir ./dist -node=n2 -bind 192.168.1.126
同样,在192.168.1.126下用Consul UI查看各服务是否正常
在192.168.1.99下,把192.168.1.126加到集群中,命令如下
consul join 192.168.1.126
注意,consul集群中,consul配置文件中的encrypt,一定要相同,否则无法放加入同一个集群
用consul operator raft list-peers查看状态,会发现n1,n2在一个集群中了
Node ID Address State Voter RaftProtocol
n1 dad74de2-173d-1c1e-add0-975a243b59eb 192.168.1.99:8300 leader true 3
n2 efe954ce-9840-5c66-fa80-b9022167d782 192.168.1.126:8300 follower true 3
些时在浏览器中多次访问view-source:http://localhost:5000/api001/values或view-source:http://localhost:5000/api002/values,会发现返回的内容是交替出现的,因为只有两个相同的API在集群中,这样就实现了负载均衡。
三、限流
限流是通过configuration.json配置完成的,具体值详见http://ocelot.readthedocs.io/en/latest/features/ratelimiting.html
每个要限流Route中
"RateLimitOptions": { "ClientWhitelist": [ "admin" ], "EnableRateLimiting": true, "Period": "1m", "PeriodTimespan": 15, "Limit": 5 }
GlobalConfiguration中 "RateLimitOptions": { "ClientIdHeader": "client_id", "QuotaExceededMessage": "too more request", "DisableRateLimitHeaders": false }
需要说明的是如果配置ClientWithelist白名单,需要在访问api的客户端添加一个Header项目,key为client_id,值为admin,此客户端就不受限流控制
为了测试限流创建 TestClient控制台程序进行测试,交果如下图,在一分钟内,用adminclinet访问API001超过五次也可以,访问API002,只能五次
四、熔错告警
熔断保护在Consul中和Ocelot中都有实现,意义当一个服务不正常时,首先不影响正常使用(因为服务作了集群,可以把请求转到别的服务器上),二是发生问题,应该用所告警。Ocelot负载均衡可以自动发现服务出问题(Consul有健检查),并停止对异常服务请求;告警是通过Consul配置文件实现的,关于watches参看https://www.consul.io/docs/agent/watches.html
{ "watches": [ { "type": "checks", "handler_type": "http", "state": "critical", "http_handler_config": { "path": "http://192.168.1.99:5000/notice", "method": "POST", "timeout": "10s", "header": { "Authorization": [ "token" ] } } } ] }
在consul启动时,会加载conf下的所有json文件,因为是json内容是watches节点,consul会作特定处理。
同时http://192.168.1.99:5000/notice映射的http://localhost:5001/notice中作了一个发邮件的操作,把发生异常的服务信息发送给对应邮箱,这里注意,测试时,不要关了API001测试,因为发邮件的功能在这个项目里,可以关掉API002测试,真实环境中,这块肯定是独立项目处理,并且采用集群的,效果如下:
关掉192.168.1.99下的API2,作业报警邮件会提示准确的检查错误和服务名称。
五、弹性和瞬态故障处理
弹性和瞬态故障处理,是Ocelot内置的功能,在网关转发每个请求时,会用Polly(https://github.com/App-vNext/Polly)处理,设置详见http://ocelot.readthedocs.io/en/latest/features/qualityofservice.html,开发上不作任何处理。