目录
一、服务注册流程分析 (基于Nacos-0.9.0版本分析)
Step1:NacosServiceRegistry.register()
NacosNamingService.registerInstance()
1. 创建心跳信息
2. 调用注册方法
NamingProxy.registerService(String serviceName, Instance instance)
Step2:InstanceController.register()
ApiCommands.regService(HttpServletRequest request)
ApiCommands.addIP4Dom(HttpServletRequest request)
Step3:ApiCommands.addOrReplaceDom(HttpServletRequest request)
二、客户端服务发现的过程(基于Nacos-0.9.0版本分析)
服务发现NacosServerList
1. getServers()
2. NacosNamingService.selectInstances(serviceName, new ArrayList(), healthyOnly)
3. HostReactor.getServiceInfo(serviceName, StringUtils.join(clusters, ","), StringUtils.EMPTY, false)
4. HostReactor.updateServiceNow()
5. HostReactor.scheduleUpdateIfAbsent()
服务端处理服务发现请求
1. ApiCommands.srvIPXT()处理HTTP请求
2. VirtualClusterDomain.srvIPs(String clientIP, List clusters)
3. VirtualClusterDomain.allIPs(List clusters)
服务注册过程中主要做了下面这些事:
跟踪服务提供者服务注册的过程,发现
创建了心跳信息,且加载到了心跳执行器BeatReactor中,BeatReactor 已经在NacosNamingService创建是就已经new 出来了,查看BeatReactor 的源码,可知,BeatReactor在创建时就启动了一个定时任务ScheduledExecutorService,这个定时任务,主要工作就是不停的通过http请求发送心跳数据到Nacos服务端:
1)向Nacos集群的随机一个节点的/v1/ns/api/clientBeat路径发送HTTP GET请求;
2)从json响应中取出clientBeatInterval字段值更新BeatReactor的clientBeatInterval属性值。
服务端的ApiCommands.clientBeat()处理/v1/ns/api/clientBeat请求:
VirtualClusterDomain.processClientBeat(clientBeat)
public void processClientBeat(final RsInfo rsInfo) {
clientBeatProcessor.setDomain(this);
clientBeatProcessor.setRsInfo(rsInfo);
HealthCheckReactor.scheduleNow(clientBeatProcessor); //立即执行ClientBeatProcessor线程任务
}
ClientBeatProcessor线程任务的处理逻辑:
public void process() {
VirtualClusterDomain virtualClusterDomain = (VirtualClusterDomain) domain;
if (!virtualClusterDomain.getEnableClientBeat()) {
return;
}
String ip = rsInfo.getIp();
String clusterName = rsInfo.getCluster();
int port = rsInfo.getPort();
//从VirtualClusterDomain的clusterMap中获取clusterName对应的Cluster对象
Cluster cluster = virtualClusterDomain.getClusterMap().get(clusterName);
//从Cluster对象中获取所有的IpAddress列表
List ipAddresses = cluster.allIPs();
for (IpAddress ipAddress: ipAddresses) {
if (ipAddress.getIp().equals(ip) && ipAddress.getPort() == port) {
Loggers.EVT_LOG.debug("[CLIENT-BEAT] refresh beat: {}", rsInfo.toString());
//更新这个ip的最后心跳时间
ipAddress.setLastBeat(System.currentTimeMillis());
if (!ipAddress.isMarked()) {
if (!ipAddress.isValid()) {
ipAddress.setValid(true);
Loggers.EVT_LOG.info("dom: {} {POS} {IP-ENABLED} valid: {}:{}@{}, region: {}, msg: client beat ok",
cluster.getDom().getName(), ip, port, cluster.getName(), DistroMapper.LOCALHOST_SITE);
PushService.domChanged(virtualClusterDomain.getNamespaceId(), domain.getName());
}
}
}
}
}
有个定时任务ClientBeatCheckTask每5s会检查是不是太久没心跳,如果没心跳就下线掉这个实例。
在VirtualClusterDomain.init()方法中启动ClientBeatCheckTask客户端心跳检查任务:
public void init() {
RaftCore.listen(this);
HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
for (Map.Entry entry : clusterMap.entrySet()) {
entry.getValue().init();
}
}
public static void scheduleCheck(ClientBeatCheckTask task) {
futureMap.putIfAbsent(task.taskKey(), EXECUTOR.scheduleWithFixedDelay(task, 5000, 5000, TimeUnit.MILLISECONDS));
}
ClientBeatCheckTask线程任务的处理逻辑:
进一步调用注册方法serverProxy.registerService(serviceName, instance);
首先是通过NacosServiceRegistry.register()进行服务注册继续跟踪,后面调用了NamingProxy.registerService(String serviceName, Instance instance),在这个方法中服务提供者向server地址发送HTTP PUT请求:http://127.0.0.1:8848/nacos/v1/ns/instance;
server端的InstanceController.register()收到/v1/ns/instance请求,调用ApiCommands.regService(HttpServletRequest request)
1)创建IpAddress对象(大部分属性值都可以从http request获取)
2)(如果没有VirtualClusterDomain对象)然后执行ApiCommands.regDom(HttpServletRequest request)方法,DomainManager的serviceMap就有了对应的VirtualClusterDomain对象;
3)向HttpServletRequest中追加下面的请求参数,并执行ApiCommands.addIP4Dom(HttpServletRequest request)
stringMap.put("dom", Arrays.asList(dom).toArray(new String[1]));
stringMap.put("ipList", Arrays.asList(JSON.toJSONString(Arrays.asList(ipAddress))).toArray(new String[1]));
stringMap.put("json", Arrays.asList("true").toArray(new String[1]));
stringMap.put("token", Arrays.asList(virtualClusterDomain.getToken()).toArray(new String[1]));
整体思路就是:
先注册domain,再注册service。就是通过一个com.alibaba.nacos.naming.core.DomainsManager 管理了一个Map,key值是domainName (断点dom值为service name), value是com.alibaba.nacos.naming.core.Domain ,注册时,使用的实现类是 com.alibaba.nacos.naming.core.VirtualClusterDomain,注册的目的是将需要注册的服务Domain信息put到这个Map当中。
重要的类/接口:
Notifier类里维护着一个BlockingQueue
Notifier.run方法的逻辑:
从BlockingQueue
遍历listeners中所有的RaftListener;
如果Pair中Datum的key是RaftListener感兴趣的:
如果Pair中ApplyAction是CHANGE,则执行RaftListener#onChange方法
如果Pair中ApplyAction是DELETE,则执行RaftListener#onDelete方法
主要过程:
如果当前不是Raft leader节点,则向leader节点的/v1/ns/api/addIP4Dom路径发送HTTP POST请求,请求参数就是http request的请求参数。最终还是交给这个方法执行。
向HttpServletRequest中继续追加下面的请求参数,并执行ApiCommands.onAddIP4Dom()
proxyParams.put("clientIP", NetUtils.localServer());
proxyParams.put("notify", "true");
proxyParams.put("term", String.valueOf(RaftCore.getPeerSet().local().term)); //leader节点的任期
proxyParams.put("timestamp", String.valueOf(timestamp)); //0L
newIPs其实是最开始请求参数中的ipList来的,ipList在注册服务的时候出现过,如图:
ipAddressMap.values() 其实就是旧的ip加上newIPs
public void onPublish(Datum datum, RaftPeer source) throws Exception {
RaftPeer local = peers.local();
if (datum.value == null) {
Loggers.RAFT.warn("received empty datum");
throw new IllegalStateException("received empty datum");
}
if (!peers.isLeader(source.ip)) {
Loggers.RAFT.warn("peer {} tried to publish data but wasn't leader, leader: {}",
JSON.toJSONString(source), JSON.toJSONString(getLeader()));
throw new IllegalStateException("peer(" + source.ip + ") tried to publish " +
"data but wasn't leader");
}
if (source.term.get() < local.term.get()) {
Loggers.RAFT.warn("out of date publish, pub-term: {}, cur-term: {}",
JSON.toJSONString(source), JSON.toJSONString(local));
throw new IllegalStateException("out of date publish, pub-term:"
+ source.term.get() + ", cur-term: " + local.term.get());
}
local.resetLeaderDue();
// if data should be persistent, usually this is always true:
if (KeyBuilder.matchPersistentKey(datum.key)) {
raftStore.write(datum);
}
datums.put(datum.key, datum);
if (isLeader()) {
local.term.addAndGet(PUBLISH_TERM_INCREASE_COUNT);
} else {
if (local.term.get() + PUBLISH_TERM_INCREASE_COUNT > source.term.get()) {
//set leader term:
getLeader().term.set(source.term.get());
local.term.set(getLeader().term.get());
} else {
local.term.addAndGet(PUBLISH_TERM_INCREASE_COUNT);
}
}
raftStore.updateTerm(local.term.get());
notifier.addTask(datum.key, ApplyAction.CHANGE);
Loggers.RAFT.info("data added/updated, key={}, term={}", datum.key, local.term);
}
private BlockingQueue tasks = new LinkedBlockingQueue(1024 * 1024);
public void addTask(String datumKey, ApplyAction action) {
if (services.containsKey(datumKey) && action == ApplyAction.CHANGE) {
return;
}
if (action == ApplyAction.CHANGE) {
services.put(datumKey, StringUtils.EMPTY);
}
Loggers.RAFT.info("add task {}", datumKey);
tasks.add(Pair.with(datumKey, action));
}
1)把上面创建的 VirtualClusterDomain对象放到DomainManager类的serviceMap中:Map
2)同时,会把VirtualClusterDomain对象注册到RaftCore的listeners列表中。
3)对Dom信息进行update或者put,如果已经存在dom,则update,否则进行put。
Nacos的负载均衡是基于Ribbon来做的。
Ribbon的负载均衡,主要通过LoadBalancerClient来实现的,而LoadBalancerClient具体交给了ILoadBalancer来处理,ILoadBalancer通过配置IRule、IPing等信息,并向EurekaClient获取注册列表的信息,并默认10秒一次向EurekaClient发送“ping”,进而检查是否更新服务列表,最后,得到注册列表后,ILoadBalancer根据IRule的策略进行负载均衡。
Ribbon最最底层也是实现spring cloud common包下的org.springframework.cloud.alibaba.nacos.ribbon.NacosServerList接口,主要是ServiceInstanceChooser下的继承接口:org.springframework.cloud.client.loadbalancer.LoadBalancerClient这是Ribbon实现负载均衡的父类接口,接下来一系列的接口实现最终会落到如何获取serverList这个问题是,答案在这个接口com.netflix.loadbalancer.ServerList,Nacos的实现就是:org.springframework.cloud.alibaba.nacos.ribbon.NacosServerList。
数据转换流程:
public class NacosServerList extends AbstractServerList {
@Autowired
private NacosDiscoveryProperties discoveryProperties;
private String serviceId;
public NacosServerList() {
}
public NacosServerList(String serviceId) {
this.serviceId = serviceId;
}
@Override
public List getInitialListOfServers() {
return getServers();
}
@Override
public List getUpdatedListOfServers() {
return getServers();
}
private List getServers() {
try {
List instances = discoveryProperties.namingServiceInstance()
.getAllInstances(serviceId);
return instancesToServerList(instances);
}
catch (Exception e) {
throw new IllegalStateException(
"Can not get service instances from nacos, serviceId=" + serviceId,
e);
}
}
private List instancesToServerList(List instances) {
List result = new ArrayList<>(instances.size());
for (Instance instance : instances) {
if (instance.isHealthy()) {
result.add(new NacosServer(instance));
}
}
return result;
}
public String getServiceId() {
return serviceId;
}
@Override
public void initWithNiwsConfig(IClientConfig iClientConfig) {
this.serviceId = iClientConfig.getClientName();
}
}
HostReactor这个类中维护了:
getServiceInfo()方法主要逻辑是:
public List getAllInstances(String serviceName, List clusters) throws NacosException {
ServiceInfo serviceInfo = hostReactor.getServiceInfo(serviceName, StringUtils.join(clusters, ","), StringUtils.EMPTY, false);
List list;
if (serviceInfo == null || CollectionUtils.isEmpty(list = serviceInfo.getHosts())) {
return new ArrayList();
}
return list;
}
创建一个UpdateTask线程任务,每隔10s执行HostReactor.updateServiceNow方法,异步更新HostReactor的serviceInfoMap集合:如果serviceInfoMap中不存在需要的serverName,则,通过方法updateService4AllIPNow 和 updateServiceNow调用接口去获取,所以,serviceInfoMap并不是只在调用服务时才去获取更新,而是通过定时任务,通过心跳式的方式,不停的异步更新,10秒更新一次。
public void scheduleUpdateIfAbsent(String serviceName, String clusters, String env, boolean allIPs) {
if (futureMap.get(ServiceInfo.getKey(serviceName, clusters, env, allIPs)) != null) {
return;
}
synchronized (futureMap) {
if (futureMap.get(ServiceInfo.getKey(serviceName, clusters, env, allIPs)) != null) {
return;
}
ScheduledFuture> future = addTask(new UpdateTask(serviceName, clusters, env, allIPs));
futureMap.put(ServiceInfo.getKey(serviceName, clusters, env, allIPs), future);
}
}
将一个UpdateTask 添加到了定时任务当中启动了,UpdateTask.run() 的代码:
@Override
public void run() {
try {
ServiceInfo serviceObj = serviceInfoMap.get(ServiceInfo.getKey(serviceName, clusters, env, allIPs));
if (serviceObj == null) {
if (allIPs) {
updateService4AllIPNow(serviceName, clusters, env);
} else {
updateServiceNow(serviceName, clusters, env);
executor.schedule(this, DEFAULT_DELAY, TimeUnit.MILLISECONDS);
}
return;
}
if (serviceObj.getLastRefTime() <= lastRefTime) {
if (allIPs) {
updateService4AllIPNow(serviceName, clusters, env);
serviceObj = serviceInfoMap.get(ServiceInfo.getKey(serviceName, clusters, env, true));
} else {
updateServiceNow(serviceName, clusters, env);
serviceObj = serviceInfoMap.get(ServiceInfo.getKey(serviceName, clusters, env));
}
} else {
// if serviceName already updated by push, we should not override it
// since the push data may be different from pull through force push
refreshOnly(serviceName, clusters, env, allIPs);
}
executor.schedule(this, serviceObj.getCacheMillis(), TimeUnit.MILLISECONDS);
lastRefTime = serviceObj.getLastRefTime();
} catch (Throwable e) {
LogUtils.LOG.warn("NA", "failed to update serviceName: " + serviceName, e);
}
}
处理HTTP请求/nacos/v1/ns/api/srvIPXT的过程:
@RequestMapping("/srvIPXT")
@ResponseBody
public JSONObject srvIPXT(HttpServletRequest request) throws Exception {
JSONObject result = new JSONObject();
if (DistroMapper.getLocalhostIP().equals(UtilsAndCommons.LOCAL_HOST_IP)) {
throw new Exception("invalid localhost ip: " + DistroMapper.getLocalhostIP());
}
String dom = BaseServlet.required(request, "dom");
VirtualClusterDomain domObj = (VirtualClusterDomain) domainsManager.getDomain(dom);
String agent = request.getHeader("Client-Version");
String clusters = BaseServlet.optional(request, "clusters", StringUtils.EMPTY);
String clientIP = BaseServlet.optional(request, "clientIP", StringUtils.EMPTY);
Integer udpPort = Integer.parseInt(BaseServlet.optional(request, "udpPort", "0"));
String env = BaseServlet.optional(request, "env", StringUtils.EMPTY);
String error = BaseServlet.optional(request, "unconsistentDom", StringUtils.EMPTY);
boolean isCheck = Boolean.parseBoolean(BaseServlet.optional(request, "isCheck", "false"));
String app = BaseServlet.optional(request, "app", StringUtils.EMPTY);
String tenant = BaseServlet.optional(request, "tid", StringUtils.EMPTY);
boolean healthyOnly = Boolean.parseBoolean(BaseServlet.optional(request, "healthOnly", "false"));
if (!StringUtils.isEmpty(error)) {
Loggers.ROLE_LOG.info("ENV-NOT-CONSISTENT", error);
}
if (domObj == null) {
throw new NacosException(NacosException.NOT_FOUND, "dom not found: " + dom);
}
checkIfDisabled(domObj);
long cacheMillis = Switch.getCacheMillis(dom);
// now try to enable the push
try {
if (udpPort > 0 && PushService.canEnablePush(agent)) {
PushService.addClient(dom,
clusters,
agent,
new InetSocketAddress(clientIP, udpPort),
pushDataSource,
tenant,
app);
cacheMillis = Switch.getPushCacheMillis(dom);
}
} catch (Exception e) {
Loggers.SRV_LOG.error("VIPSRV-API", "failed to added push client", e);
cacheMillis = Switch.getCacheMillis(dom);
}
List srvedIPs;
srvedIPs = domObj.srvIPs(clientIP, Arrays.asList(StringUtils.split(clusters, ",")));
if (CollectionUtils.isEmpty(srvedIPs)) {
String msg = "no ip to serve for dom: " + dom;
Loggers.SRV_LOG.debug(msg);
}
Map> ipMap = new HashMap<>(2);
ipMap.put(Boolean.TRUE, new ArrayList());
ipMap.put(Boolean.FALSE, new ArrayList());
for (IpAddress ip : srvedIPs) {
ipMap.get(ip.isValid()).add(ip);
}
if (isCheck) {
result.put("reachProtectThreshold", false);
}
double threshold = domObj.getProtectThreshold();
if ((float) ipMap.get(Boolean.TRUE).size() / srvedIPs.size() <= threshold) {
Loggers.SRV_LOG.warn("protect threshold reached, return all ips, " +
"dom: " + dom);
if (isCheck) {
result.put("reachProtectThreshold", true);
}
ipMap.get(Boolean.TRUE).addAll(ipMap.get(Boolean.FALSE));
ipMap.get(Boolean.FALSE).clear();
}
if (isCheck) {
result.put("protectThreshold", domObj.getProtectThreshold());
result.put("reachLocalSiteCallThreshold", false);
return new JSONObject();
}
JSONArray hosts = new JSONArray();
for (Map.Entry> entry : ipMap.entrySet()) {
List ips = entry.getValue();
if (healthyOnly && !entry.getKey()) {
continue;
}
for (IpAddress ip : ips) {
JSONObject ipObj = new JSONObject();
ipObj.put("ip", ip.getIp());
ipObj.put("port", ip.getPort());
ipObj.put("valid", entry.getKey());
ipObj.put("marked", ip.isMarked());
ipObj.put("instanceId", ip.getInstanceId());
ipObj.put("metadata", ip.getMetadata());
ipObj.put("enabled", ip.isEnabled());
ipObj.put("weight", ip.getWeight());
ipObj.put("clusterName", ip.getClusterName());
ipObj.put("serviceName", ip.getServiceName());
hosts.add(ipObj);
}
}
result.put("hosts", hosts);
result.put("dom", dom);
result.put("cacheMillis", cacheMillis);
result.put("lastRefTime", System.currentTimeMillis());
result.put("checksum", domObj.getChecksum() + System.currentTimeMillis());
result.put("useSpecifiedURL", false);
result.put("clusters", clusters);
result.put("env", env);
result.put("metadata", domObj.getMetadata());
return result;
}
参考:
https://www.jianshu.com/p/e1e3ecedc8b3
https://blog.csdn.net/Mr_Errol/article/details/84938993
https://blog.csdn.net/Mr_Errol/article/details/85089129