大家好,我是君哥。
今天来聊一聊使用 Nacos 做注册中心怎么做优雅发布。
跟其他的注册中心一样,Nacos 作为注册中心的使用如下图:
Service Provider 启动后注册到 Nacos Server,Service Consumer 则从 Nacos Server 拉取服务列表,根据一定算法选择一个 Service Provider 来发送请求。
对于优雅发布,要求是 Service Provider 上线(注册到 Nacos)后,服务能够正常地接收和处理请求,而 Service Provider 停服后,则不会再收到请求。这就有两个要求:
优雅上线:Service Provider 发布完成之前,Service Consumer 不应该从服务列表中拉取到这个服务地址;
优雅下线:Service Provider 下线后,Service Consumer 不会从服务列表中拉取到这个服务地址。
解决了这两个问题,优雅发布就可以做到了。
搭建环境是为了看 Nacos 日志,通过日志找到对应的源代码。本文搭建的环境如下图:
启动 springboot-provider 的应用,注册到 Nacos,启动日志如下:
2023-06-11 18:58:10,120 [main] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
2023-06-11 18:58:10,121 [main] [INFO] com.alibaba.nacos.client.naming - [REGISTER-SERVICE] public registering service DEFAULT_GROUP@@springboot-provider with instance: Instance{instanceId='null', ip='192.168.31.94', port=8083, weight=1.0, healthy=true, enabled=true, ephemeral=true, clusterName='DEFAULT', serviceName='null', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}}
2023-06-11 18:58:10,133 [main] [INFO] com.alibaba.cloud.nacos.registry.NacosServiceRegistry - nacos registry, DEFAULT_GROUP springboot-provider 192.168.31.94:8083 register finished
2023-06-11 18:58:10,221 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 18082 (http)
2023-06-11 18:58:10,222 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-127.0.0.1-18082"]
2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardService - Starting service [Tomcat]
2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.21]
2023-06-11 18:58:10,239 [main] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat-1].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2023-06-11 18:58:10,239 [main] [INFO] org.springframework.web.context.ContextLoader - Root WebApplicationContext: initialization completed in 99 ms
2023-06-11 18:58:10,268 [main] [INFO] org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 22 endpoint(s) beneath base path '/actuator'
2023-06-11 18:58:10,336 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-127.0.0.1-18082"]
2023-06-11 18:58:10,340 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 18082 (http) with context path ''
2023-06-11 18:58:10,342 [main] [INFO] boot.Application - Started Application in 7.051 seconds (JVM running for 7.874)
2023-06-11 18:58:10,358 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider.properties+DEFAULT_GROUP
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider.properties, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider-dev.properties+DEFAULT_GROUP
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider-dev.properties, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider+DEFAULT_GROUP
2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,639 [RMI TCP Connection(1)-192.168.31.94] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
2023-06-11 18:58:10,839 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
2023-06-11 18:58:10,840 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - modified ips(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]
2023-06-11 18:58:10,841 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - current ips:(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]
我们再看下 Nacos 的日志,这里看的文件 naming-server.log,日志如下:
2023-06-11 18:58:09,723 INFO Client connection 192.168.31.94:51885#true connect
2023-06-11 18:58:10,105 INFO Client change for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revision=1}, 192.168.31.94:8083#true
2023-06-11 18:58:18,204 INFO Client connection 192.168.31.94:60850#true disconnect, remove instances and subscribers
springboot-provider 启动成功后,从Nacos 管理后台可以看到下图:
服务下线后,Nacos 日志如下:
2023-06-11 19:01:03,375 INFO Client connection 192.168.31.94:51885#true disconnect, remove instances and subscribers
2023-06-11 19:01:05,048 INFO [AUTO-DELETE-IP] service: Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revision=2}, ip: {"ip":"192.168.31.94","port":8083,"healthy":false,"cluster":"DEFAULT","extendDatum":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1","customInstanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider"},"lastHeartBeatTime":1686481231604,"metadataId":"192.168.31.94:8083:DEFAULT"}
2023-06-11 19:01:05,048 INFO Client remove for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revision=2}, 192.168.31.94:8083#true
2023-06-11 19:01:08,379 INFO Client connection 192.168.31.94:8083#true disconnect, remove instances and subscribers
在 springboot-consumer 上跑一个单元测试的用例,用 FeignClient 调用下面的方法:
@FeignClient(value = "springboot-provider", configuration = FeignMultipartSupportConfig.class)
public interface FeignAsEurekaClient {
@PostMapping("/employee/save")
String saveEmployeebyName(@RequestBody Employee employee);
}
日志如下:
2023-06-11 19:15:47,694 [main] [INFO] org.springframework.test.context.transaction.TransactionContext - Began transaction (1) for test context [DefaultTestContext@5bf0d49 testClass = TestFeignAsEurekaClient, testInstance = boot.service.TestFeignAsEurekaClient@10683d9d, testMethod = testPostEmployByFeign@TestFeignAsEurekaClient, testException = [null], mergedContextConfiguration = [WebMergedContextConfiguration@5b7a5baa testClass = TestFeignAsEurekaClient, locations = '{}', classes = '{class boot.Application, class boot.Application}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{org.springframework.boot.test.context.SpringBootTestContextBootstrapper=true, server.port=0}', contextCustomizers = set[org.springframework.boot.test.context.filter.ExcludeFilterContextCustomizer@166fa74d, org.springframework.boot.test.json.DuplicateJsonObjectContextCustomizerFactory$DuplicateJsonObjectContextCustomizer@588df31b, org.springframework.boot.test.mock.mockito.MockitoContextCustomizer@0, org.springframework.boot.test.web.client.TestRestTemplateContextCustomizer@7fad8c79, org.springframework.boot.test.autoconfigure.properties.PropertyMappingContextCustomizer@0, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverContextCustomizerFactory$Customizer@10b48321], resourceBasePath = 'src/main/webapp', contextLoader = 'org.springframework.boot.test.context.SpringBootContextLoader', parent = [null]], attributes = map['org.springframework.test.context.web.ServletTestExecutionListener.activateListener' -> false]]; transaction manager [org.springframework.jdbc.datasource.DataSourceTransactionManager@693676d]; rollback [true]
2023-06-11 19:15:47,941 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2023-06-11 19:15:47,962 [main] [INFO] com.netflix.loadbalancer.BaseLoadBalancer - Client: springboot-provider instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2023-06-11 19:15:47,969 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - Using serverListUpdater PollingServerListUpdater
2023-06-11 19:15:48,064 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2023-06-11 19:15:48,064 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - DynamicServerListLoadBalancer for client springboot-provider initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[192.168.31.94:8083],Load balancer stats=Zone stats: {unknown=[Zone:unknown; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
},Server stats: [[Server:192.168.31.94:8083; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 08:00:00 CST 1970; First connection made: Thu Jan 01 08:00:00 CST 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]}ServerList:com.alibaba.cloud.nacos.ribbon.NacosServerList@24d998ba
注意,这里使用了 OpenFeign,其中用到了 Ribbon 做负载均衡,那就需要考虑到 Ribbon 的刷新本地服务列表的时间,从源代码中看,刷新周期是 30s。如下图:
Ribbon 刷新缓存的逻辑参考下面代码:
public synchronized void start(final UpdateAction updateAction) {
if (isActive.compareAndSet(false, true)) {
final Runnable wrapperRunnable = new Runnable() {
@Override
public void run() {
//...
}
};
scheduledFuture = getRefreshExecutor().scheduleWithFixedDelay(
wrapperRunnable,
initialDelayMs,
refreshIntervalMs,//这里定义的是30s
TimeUnit.MILLISECONDS
);
}//...
}
前面第一节提到过,优雅发布有两个要求:优雅上线和优雅下线。
Nacos 客户端和服务端的交互采用长轮询的方式,服务端收到客户端的请求后,首先会判断服务端本地的服务列表是否跟客户端的相比是否发生变化(比较 MD5),如果发生变化则立即通知客户端,否则放入长轮询队列挂起,如果这段时间内服务列表发生变化,则立刻通知客户端,否则等到超时后再通知客户端。代码如下:
//LongPollingService.java
public void addLongPollingClient(HttpServletRequest req, HttpServletResponse rsp, Map clientMd5Map,
int probeRequestSize) {
String str = req.getHeader(LongPollingService.LONG_POLLING_HEADER);
int delayTime = SwitchService.getSwitchInteger(SwitchService.FIXED_DELAY_TIME, 500);
// Add delay time for LoadBalance, and one response is returned 500 ms in advance to avoid client timeout.
long timeout = -1L;
if (isFixedPolling()) {
//...
} else {
timeout = Math.max(10000, Long.parseLong(str) - delayTime);//29.5s
long start = System.currentTimeMillis();
List changedGroups = MD5Util.compareMd5(req, rsp, clientMd5Map);
if (changedGroups.size() > 0) {
//服务列表发生变化,直接返回给客户端
generateResponse(req, rsp, changedGroups);
return;
} //...
}
String ip = RequestUtil.getRemoteIp(req);
//..
// Must be called by http thread, or send response.
final AsyncContext asyncContext = req.startAsync();
// AsyncContext.setTimeout() is incorrect, Control by oneself
asyncContext.setTimeout(0L);
String appName = req.getHeader(RequestUtil.CLIENT_APPNAME_HEADER);
String tag = req.getHeader("Vipserver-Tag");
//服务列表没有发生变化,放入长轮询队列等待调度
ConfigExecutor.executeLongPolling(
new ClientLongPolling(asyncContext, clientMd5Map, ip, probeRequestSize, timeout, appName, tag));
}
从上面服务端源代码可以看到,这里超时时间是 30s,其中 29.5s 用于挂起等待,0.5s 检查服务列表是否发生变化。这里使用了长轮询,如果服务端列表发生变化,会立刻通知客户端,所以对优雅发布影响非常小。
服务列表发生变化后,客户端用单独的线程通知监听的 listener,代码如下:
public void startInternal() {
executor.schedule(() -> {
while (!executor.isShutdown() && !executor.isTerminated()) {
try {
listenExecutebell.poll(5L, TimeUnit.SECONDS);
//...
executeConfigListen();
} catch (Throwable e) {
//...
}
}
}, 0L, TimeUnit.MILLISECONDS);
}
优雅上线存在的问题主要在于 Service Provider 注册到 Nacos 后,服务还没有完成初始化,请求已经到来。这种情况主要原因是 Service Provider 启动后立刻注册 Naocs,而本身提供的接口可能还没有初始化完成。
这种情况的解决方法是关闭自动注册:
spring.cloud.nacos.discovery.registerEnabled=false
在服务初始化后使用代码手动注册,代码如下:
Properties setting8 = new Properties();
String serverIp8 = "127.0.0.1:8848";
setting8.put(PropertyKeyConst.SERVER_ADDR, serverIp8);
setting8.put(PropertyKeyConst.USERNAME, "nacos");
setting8.put(PropertyKeyConst.PASSWORD, "nacos");
NamingService inaming8 = NacosFactory.createNamingService(setting7);
inaming8.registerInstance("springboot-provider", "192.168.31.94", 8083);
服务下线分两种情况,一个是正常停服,一个是服务故障。
3.2.1 正常停服
对于正常停服,Nacos 采用心跳检测来实现服务在线。心跳周期是 5s,Nacos Server 如果 15s 没收到心跳就会将实例设置为不健康,在 30s 没收到心跳才会讲这个服务删除。当然这个时间可以设置:
spring.cloud.nacos.discovery.metadata.preserved.heart.beat.interval=1000 #心跳间隔5s->1s
spring.cloud.nacos.discovery.metadata.preserved.heart.beat.timeout=3000 #超时时间15s->3s
spring.cloud.nacos.discovery.metadata.preserved.ip.delete.timeout=5000 #删除时间30s->5s
但这样并不能保证服务停止后能够立刻从 Nacos Server 下线,很有可能服务停止后还能再收到请求,最好的方式是手动下线,比如增加一个 API 接口,服务下线之前增加 preStopHook 函数调用这个 API 接口来实现下线。API 接口示例代码如下:
@GetMapping(value = "/nacos/deregisterInstance")
public String deregisterInstance() {
Properties prop = new Properties();
prop.setProperty("serverAddr", "localhost");
prop.put(PropertyKeyConst.NAMESPACE, "test");
NacosNamingService client = new NacosNamingService(prop);
client.deregisterInstance("springboot-provider", "192.168.31.94", 8083);
return "success";
}
在使用 Ribbon 的场景,也需要考虑 Ribbon 更新本地缓存服务列表的机制,手动下线后,可以再等待 30s 再关闭服务。
3.2.1 服务故障
第二种情况是服务故障,但是并没有停服,这种情况是很难避免外部请求再发送过来的。处理方式是对这个服务本身的健康检查结果进行处理,比如连续三次健康检查失败,可以调用上面的 API 接口让服务下线。
无论是哪一款注册中心,优雅发布要解决的问题都是优雅上线和优雅下线。本文结合 Nacos 的原理讲解了 Nacos 的优雅发布,希望对你有所帮助。