前言
上一篇<
euraka server 端通过jersey来接收客户端发送的rest请求。Jersey是一个 REST 框架,提供 JAX-RS 参考实现等。Jersey 提供了自己的 API,这些 API 扩展了 JAX-RS 工具箱,并具有其他功能和实用程序,以进一步简化 RESTful 服务和客户端开发。Jersey还公开了许多扩展SPI,以便开发人员可以扩展Jersey以最适合他们的需求。这不是本篇的重点,更多知识点,请移步官方文档。
在EurekaServerAutoConfiguration
中,注册了jersey过滤器,用来处理客户端的请求:
@Bean
public FilterRegistrationBean> jerseyFilterRegistration(javax.ws.rs.core.Application eurekaJerseyApp) {
FilterRegistrationBean bean = new FilterRegistrationBean();
bean.setFilter(new ServletContainer(eurekaJerseyApp));
bean.setOrder(Ordered.LOWEST_PRECEDENCE);
bean.setUrlPatterns(Collections.singletonList(EurekaConstants.DEFAULT_PREFIX + "/*"));
return bean;
}
源码分析
服务注册
在com.netflix.eureka.resources
包下,是Eureka Server对于Eureka client的REST请求的定义。看ApplicationResource
类(这是一类请求,应用类的请求),类似于spring mvc应用的@Controller注解:@Produces({“application/xml”, “application/json”}),接受xml或json;@Consumes({"application/json", "application/xml"}) 返回xml或json数据。
在ApplicationResource
类中,有如下方法(添加实例接口):
@POST
@Consumes({"application/json", "application/xml"})
public Response addInstance(InstanceInfo info,
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
logger.debug("Registering instance {} (replication={})", info.getId(), isReplication);
// validate that the instanceinfo contains all the necessary required fields
if (isBlank(info.getId())) {
return Response.status(400).entity("Missing instanceId").build();
} else if (isBlank(info.getHostName())) {
return Response.status(400).entity("Missing hostname").build();
} else if (isBlank(info.getIPAddr())) {
return Response.status(400).entity("Missing ip address").build();
} else if (isBlank(info.getAppName())) {
return Response.status(400).entity("Missing appName").build();
} else if (!appName.equals(info.getAppName())) {
return Response.status(400).entity("Mismatched appName, expecting " + appName + " but was " + info.getAppName()).build();
} else if (info.getDataCenterInfo() == null) {
return Response.status(400).entity("Missing dataCenterInfo").build();
} else if (info.getDataCenterInfo().getName() == null) {
return Response.status(400).entity("Missing dataCenterInfo Name").build();
}
// handle cases where clients may be registering with bad DataCenterInfo with missing data
DataCenterInfo dataCenterInfo = info.getDataCenterInfo();
if (dataCenterInfo instanceof UniqueIdentifier) {
String dataCenterInfoId = ((UniqueIdentifier) dataCenterInfo).getId();
if (isBlank(dataCenterInfoId)) {
boolean experimental = "true".equalsIgnoreCase(serverConfig.getExperimental("registration.validation.dataCenterInfoId"));
if (experimental) {
String entity = "DataCenterInfo of type " + dataCenterInfo.getClass() + " must contain a valid id";
return Response.status(400).entity(entity).build();
} else if (dataCenterInfo instanceof AmazonInfo) {
AmazonInfo amazonInfo = (AmazonInfo) dataCenterInfo;
String effectiveId = amazonInfo.get(AmazonInfo.MetaDataKey.instanceId);
if (effectiveId == null) {
amazonInfo.getMetadata().put(AmazonInfo.MetaDataKey.instanceId.getName(), info.getId());
}
} else {
logger.warn("Registering DataCenterInfo of type {} without an appropriate id", dataCenterInfo.getClass());
}
}
}
//PeerAwareInstanceRegistryImpl的register方法
registry.register(info, "true".equals(isReplication));
return Response.status(204).build(); // 204 to be backwards compatible
}
上面的方法前半部分主要是进行参数验证,最后通过registry.register
方法,可以追溯到PeerAwareInstanceRegistryImpl
的方法:public void register(final InstanceInfo info, final boolean isReplication) :
PeerAwareInstanceRegistryImpl
@Override
public void register(final InstanceInfo info, final boolean isReplication) {
// 默认的租约持续时间:默认90秒
int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
// 如果客户端设置了这个参数,就使用客户端的设置
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
// 调用父类AbstractInstanceRegistry的register方法
super.register(info, leaseDuration, isReplication);
// 复制实例注册信息到集群中的其它节点
replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
}
先只关注服务注册逻辑,回过头来再分析集群节点同步
AbstractInstanceRegistry
private final ConcurrentHashMap>> registry
= new ConcurrentHashMap>>();
正式分析之前,先来看看服务端存储注册信息的数据结构:将服务实例注册信息存入一个双层的map中,外层map的key为应用的appName,value为该服务下的所有实例;第二层map的key为实例的id,value为该实例对应的信息。
/**
* Registers a new instance with a given duration.
*
* @see com.netflix.eureka.lease.LeaseManager#register(java.lang.Object, int, boolean)
*/
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
// 获取读锁
read.lock();
try {
// 根据appName获取所有实例的信息
Map> gMap = registry.get(registrant.getAppName());
// 设置计数器,此处由于是来自客户端的请求,而不是集群节点数据同步的请求,所以此处isReplication=false
REGISTER.increment(isReplication);
if (gMap == null) {
// 如果appName,没有实例,那么就创建一个map, 并添加的注册表中
final ConcurrentHashMap> gNewMap = new ConcurrentHashMap>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
// 根据实例id,获取实例信息
Lease existingLease = gMap.get(registrant.getId());
// Retain the last dirty timestamp without overwriting it, if there is already a lease
// 如果已经存在实例注册信息
if (existingLease != null && (existingLease.getHolder() != null)) {
// 获取已存在的实例信息的lastDirtyTimestamp属性值
Long existingLastDirtyTimestamp = existingLease.getHolder().getLastDirtyTimestamp();
// 获取本次客户端传入的实例信息的lastDirtyTimestamp属性值
Long registrationLastDirtyTimestamp = registrant.getLastDirtyTimestamp();
logger.debug("Existing lease found (existing={}, provided={}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
// this is a > instead of a >= because if the timestamps are equal, we still take the remote transmitted
// InstanceInfo instead of the server local copy.
// 如果已注册的实例信息的lastDirtyTimestamp属性值,大于客户端传入的,就用本地覆盖远程传入的
if (existingLastDirtyTimestamp > registrationLastDirtyTimestamp) {
logger.warn("There is an existing lease and the existing lease's dirty timestamp {} is greater" +
" than the one that is being registered {}", existingLastDirtyTimestamp, registrationLastDirtyTimestamp);
logger.warn("Using the existing instanceInfo instead of the new instanceInfo as the registrant");
registrant = existingLease.getHolder();
}
} else {
// 说明服务端注册表中,不存在实例信息,那么说明这是一个新的实例
// The lease does not exist and hence it is a new registration
synchronized (lock) {
// 这一段代码在上一篇中有讲到,与自我保护机制有关
// 如果期望发送心跳续约的客户端数大于0
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
// 加1
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
// 更新每分钟最小续约数阈值
updateRenewsPerMinThreshold();
}
}
logger.debug("No previous lease information found; it is new registration");
}
// 构造实例租约信息
Lease lease = new Lease<>(registrant, leaseDuration);
if (existingLease != null) {
// 如果当前存在租约,设置服务上线时间
lease.setServiceUpTimestamp(existingLease.getServiceUpTimestamp());
}
// 存入第二层map中
gMap.put(registrant.getId(), lease);
// 加入最近注册队列,用于调试和统计
recentRegisteredQueue.add(new Pair(
System.currentTimeMillis(),
registrant.getAppName() + "(" + registrant.getId() + ")"));
// This is where the initial state transfer of overridden status happens
if (!InstanceStatus.UNKNOWN.equals(registrant.getOverriddenStatus())) {
logger.debug("Found overridden status {} for instance {}. Checking to see if needs to be add to the "
+ "overrides", registrant.getOverriddenStatus(), registrant.getId());
if (!overriddenInstanceStatusMap.containsKey(registrant.getId())) {
logger.info("Not found overridden id {} and hence adding it", registrant.getId());
overriddenInstanceStatusMap.put(registrant.getId(), registrant.getOverriddenStatus());
}
}
InstanceStatus overriddenStatusFromMap = overriddenInstanceStatusMap.get(registrant.getId());
if (overriddenStatusFromMap != null) {
logger.info("Storing overridden status {} from map", overriddenStatusFromMap);
registrant.setOverriddenStatus(overriddenStatusFromMap);
}
// Set the status based on the overridden status rules
InstanceStatus overriddenInstanceStatus = getOverriddenInstanceStatus(registrant, existingLease, isReplication);
registrant.setStatusWithoutDirty(overriddenInstanceStatus);
// If the lease is registered with UP status, set lease service up timestamp
// 如果实例状态为UP
if (InstanceStatus.UP.equals(registrant.getStatus())) {
// 设置租约的serviceUpTimestamp属性,前提是serviceUpTimestamp == 0
lease.serviceUp();
}
registrant.setActionType(ActionType.ADDED);
// 加入最近变更队列中,用于eureka client 增量获取服务注册表信息
recentlyChangedQueue.add(new RecentlyChangedItem(lease));
registrant.setLastUpdatedTimestamp();
// 使缓存失效(读写缓存)
invalidateCache(registrant.getAppName(), registrant.getVIPAddress(), registrant.getSecureVipAddress());
logger.info("Registered instance {}/{} with status {} (replication={})",
registrant.getAppName(), registrant.getId(), registrant.getStatus(), isReplication);
} finally {
// 释放读锁
read.unlock();
}
}
服务续约
在Eureka Client完成服务注册之后,它需要定时向Eureka Server发送心跳请求(默认30秒一次),维持自己在Eureka Server中租约的有效性。
此代码实现在com.netflix.eureka.resources.InstanceResource
中,参见此方法:
InstanceResource
@PUT
public Response renewLease(
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication,
@QueryParam("overriddenstatus") String overriddenStatus,
@QueryParam("status") String status,
@QueryParam("lastDirtyTimestamp") String lastDirtyTimestamp) {
// 如果是来自集群节点同步isReplication=true
boolean isFromReplicaNode = "true".equals(isReplication);
// 调用PeerAwareInstanceRegistryImpl的renew方法
boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);
// Not found in the registry, immediately ask for a register
// 未在注册表中发现,立即要求注册
if (!isSuccess) {
logger.warn("Not Found (Renew): {} - {}", app.getName(), id);
return Response.status(Status.NOT_FOUND).build();
}
// Check if we need to sync based on dirty time stamp, the client
// instance might have changed some value
Response response;
if (lastDirtyTimestamp != null && serverConfig.shouldSyncWhenTimestampDiffers()) {
response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
// Store the overridden status since the validation found out the node that replicates wins
if (response.getStatus() == Response.Status.NOT_FOUND.getStatusCode()
&& (overriddenStatus != null)
&& !(InstanceStatus.UNKNOWN.name().equals(overriddenStatus))
&& isFromReplicaNode) {
registry.storeOverriddenStatusIfRequired(app.getAppName(), id, InstanceStatus.valueOf(overriddenStatus));
}
} else {
response = Response.ok().build();
}
logger.debug("Found (Renew): {} - {}; reply status={}", app.getName(), id, response.getStatus());
return response;
}
进入PeerAwareInstanceRegistryImpl
的renew
方法
PeerAwareInstanceRegistryImpl
public boolean renew(final String appName, final String id, final boolean isReplication) {
//调用父类AbstractInstanceRegistry的renew方法
if (super.renew(appName, id, isReplication)) {
// 同步到集群其它节点
replicateToPeers(Action.Heartbeat, appName, id, null, null, isReplication);
return true;
}
return false;
}
进入AbstractInstanceRegistry
的renew
方法
AbstractInstanceRegistry
public boolean renew(String appName, String id, boolean isReplication) {
//续约计数器累加
RENEW.increment(isReplication);
// 根据appName获取所有实例租约信息
Map> gMap = registry.get(appName);
Lease leaseToRenew = null;
if (gMap != null) {
// 根据实例id获取实例租约信息
leaseToRenew = gMap.get(id);
}
if (leaseToRenew == null) {
// 租约信息等于null,续约未发现计数器累加
RENEW_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: lease doesn't exist, registering resource: {} - {}", appName, id);
return false;
} else {
// 获取租约持有的实例信息
InstanceInfo instanceInfo = leaseToRenew.getHolder();
if (instanceInfo != null) {
// touchASGCache(instanceInfo.getASGName());
// 根据实例状态覆盖规则,获取实例状态
InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
instanceInfo, leaseToRenew, isReplication);
if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
// 实例状态等于UNKNOWN
logger.info("Instance status UNKNOWN possibly due to deleted override for instance {}"
+ "; re-register required", instanceInfo.getId());
RENEW_NOT_FOUND.increment(isReplication);
return false;
}
if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
logger.info(
"The instance status {} is different from overridden instance status {} for instance {}. "
+ "Hence setting the status to overridden status", instanceInfo.getStatus().name(),
overriddenInstanceStatus.name(),
instanceInfo.getId());
instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);
}
}
// 累加最后一分钟的续约数(内部有一个定时任务,每隔一分钟,重置一次计数器)
renewsLastMin.increment();
// 更新当前实例在租约中的最后更新时间,实现续约
leaseToRenew.renew();
return true;
}
}
Lease
public void renew() {
// 当前系统时间 + 租约持续时间(默认90s)
lastUpdateTimestamp = System.currentTimeMillis() + duration;
}
服务剔除
剔除的限制
- 自我保护期间不清除。
- 分批次清除。
- 服务是逐个随机剔除,剔除均匀分布在所有应用中,防止在同一时间内同一服务集群中的服务全部过期被剔除,造成在大量剔除服务时,并在进行自我保护时,促使程序崩溃。
剔除服务的定时任务
剔除服务是个定时任务,用EvictionTask
执行,默认60秒执行一次,延时60秒执行。定时剔除过期服务。
服务剔除将会遍历registry注册表,找出其中所有的过期租约,然后根据配置文件中续租百分比阀值和当前注册表的租约总数量计算出最大允许的剔除租约的数量(当前注册表中租约总数量减去当前注册表租约阀值),分批次剔除过期的服务实例租约。对过期的服务实例租约调用AbstractInstanceRegistry#internalCancel
服务下线的方法将其从注册表中清除掉。
关于服务剔除已经在上一篇中介绍过,请回顾上一篇。
服务下线
Eureka Client在应用销毁时,会向Eureka Server发送服务下线请求,清除注册表中关于本应用的租约,避免无效的服务调用。在服务剔除的过程中,也是通过服务下线的逻辑完成对单个服务实例过期租约的清除工作。
此代码实现在com.netflix.eureka.resources.InstanceResource
中,参见此方法:
InstanceResource
@DELETE
public Response cancelLease(
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
try {
// 调用PeerAwareInstanceRegistryImpl的cancel方法
boolean isSuccess = registry.cancel(app.getName(), id,
"true".equals(isReplication));
if (isSuccess) {
logger.debug("Found (Cancel): {} - {}", app.getName(), id);
return Response.ok().build();
} else {
logger.info("Not Found (Cancel): {} - {}", app.getName(), id);
return Response.status(Status.NOT_FOUND).build();
}
} catch (Throwable e) {
logger.error("Error (cancel): {} - {}", app.getName(), id, e);
return Response.serverError().build();
}
}
进入PeerAwareInstanceRegistryImpl
的cancel
方法
PeerAwareInstanceRegistryImpl
@Override
public boolean cancel(final String appName, final String id,
final boolean isReplication) {
// 调用父类AbstractInstanceRegistry的cancel方法
if (super.cancel(appName, id, isReplication)) {
// 同步到集群其它节点
replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);
return true;
}
return false;
}
进入AbstractInstanceRegistry
的cancel
方法
AbstractInstanceRegistry
@Override
public boolean cancel(String appName, String id, boolean isReplication) {
// 调用内部的internalCancel方法
return internalCancel(appName, id, isReplication);
}
进入internalCancel
方法
protected boolean internalCancel(String appName, String id, boolean isReplication) {
// 获取读锁
read.lock();
try {
// 计数器累加
CANCEL.increment(isReplication);
// 根据appName获取所有实例的租约信息
Map> gMap = registry.get(appName);
Lease leaseToCancel = null;
if (gMap != null) {
// 移除对应的实例租约信息
leaseToCancel = gMap.remove(id);
}
// 添加到最近下线服务队列
recentCanceledQueue.add(new Pair(System.currentTimeMillis(), appName + "(" + id + ")"));
InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
if (instanceStatus != null) {
logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
}
if (leaseToCancel == null) {
CANCEL_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
return false;
} else {
// 更新evictionTimestamp为当前系统时间
leaseToCancel.cancel();
InstanceInfo instanceInfo = leaseToCancel.getHolder();
String vip = null;
String svip = null;
if (instanceInfo != null) {
instanceInfo.setActionType(ActionType.DELETED);
// 添加到最近变更服务队列
recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
instanceInfo.setLastUpdatedTimestamp();
vip = instanceInfo.getVIPAddress();
svip = instanceInfo.getSecureVipAddress();
}
// 使缓存失效(读写缓存)
invalidateCache(appName, vip, svip);
logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
}
} finally {
// 释放锁
read.unlock();
}
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to cancel it, reduce the number of clients to send renews.
// 客户端数减1
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews - 1;
// 更新每分钟最小续约数阈值
updateRenewsPerMinThreshold();
}
}
return true;
}
集群同步
在服务注册、服务续约以及服务下线等操作中,都有replicateToPeers
方法的调用。
PeerAwareInstanceRegistryImpl
private void replicateToPeers(Action action, String appName, String id,
InstanceInfo info /* optional */,
InstanceStatus newStatus /* optional */, boolean isReplication) {
Stopwatch tracer = action.getTimer().start();
try {
if (isReplication) {
numberOfReplicationsLastMin.increment();
}
// If it is a replication already, do not replicate again as this will create a poison replication
if (peerEurekaNodes == Collections.EMPTY_LIST || isReplication) {
return;
}
// peerEurekaNodes在上一篇中,通过创建DefaultEurekaServerContext实例,内部通过定时器,默认每10分钟同步一次
for (final PeerEurekaNode node : peerEurekaNodes.getPeerEurekaNodes()) {
// If the url represents this host, do not replicate to yourself.
// 如果要复制的节点是本身,就跳过
if (peerEurekaNodes.isThisMyUrl(node.getServiceUrl())) {
continue;
}
// 同步
replicateInstanceActionsToPeers(action, appName, id, info, newStatus, node);
}
} finally {
tracer.stop();
}
}
进入replicateInstanceActionsToPeers
方法
private void replicateInstanceActionsToPeers(Action action, String appName,
String id, InstanceInfo info, InstanceStatus newStatus,
PeerEurekaNode node) {
try {
InstanceInfo infoFromRegistry;
CurrentRequestVersion.set(Version.V2);
switch (action) {
case Cancel:
// 服务下线
node.cancel(appName, id);
break;
case Heartbeat:
// 服务续约
InstanceStatus overriddenStatus = overriddenInstanceStatusMap.get(id);
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.heartbeat(appName, id, infoFromRegistry, overriddenStatus, false);
break;
case Register:
// 服务注册
node.register(info);
break;
case StatusUpdate:
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.statusUpdate(appName, id, newStatus, infoFromRegistry);
break;
case DeleteStatusOverride:
infoFromRegistry = getInstanceByAppAndId(appName, id, false);
node.deleteStatusOverride(appName, id, infoFromRegistry);
break;
}
} catch (Throwable t) {
logger.error("Cannot replicate information to {} for action {}", node.getServiceUrl(), action.name(), t);
} finally {
CurrentRequestVersion.remove();
}
}
此replicateInstanceActionsToPeers
方法中,类PeerEurekaNode
的实例node
的各种方法,cancel,register等,用了batchingDispatcher.process
(作用是将同一时间段内,相同服务实例的相同操作将使用相同的任务编号,在进行同步复制的时候,将根据任务编号合并操作,减少同步操作的数量和网络消耗,但是同时也造成了同步复制的延时性,不满足CAP中的C(强一致性)。
所以Eureka,只满足AP。
以服务下线为例:
public void cancel(final String appName, final String id) throws Exception {
long expiryTime = System.currentTimeMillis() + maxProcessingDelayMs;
batchingDispatcher.process(
taskId("cancel", appName, id),
new InstanceReplicationTask(targetHost, Action.Cancel, appName, id) {
@Override
public EurekaHttpResponse execute() {
return replicationClient.cancel(appName, id);
}
@Override
public void handleFailure(int statusCode, Object responseEntity) throws Throwable {
super.handleFailure(statusCode, responseEntity);
if (statusCode == 404) {
logger.warn("{}: missing entry.", getTaskName());
}
}
},
expiryTime
);
}
内部将InstanceReplicationTask提交到阻塞队列中,启动一个线程批量处理复制任务。
总结
eureka server接收来自客户端的rest请求,通过服务注册、服务续约、服务下线以及服务剔除,维护内部的注册表。但其实Eureka Server内部存在三个变量:(registry、readWriteCacheMap、readOnlyCacheMap)保存服务注册信息,默认情况下定时任务每30s将readWriteCacheMap同步至readOnlyCacheMap,每60s清理超过90s未续约的节点,Eureka Client每30s从readOnlyCacheMap更新服务注册信息,而客户端服务的注册则从registry更新服务注册信息。
这里为什么要设计多级缓存呢?原因很简单,就是当存在大规模的服务注册和更新时,如果只是修改一个ConcurrentHashMap数据,那么势必因为锁的存在导致竞争,影响性能。而Eureka又是AP模型,只需要满足最终可用就行。所以它在这里用到多级缓存来实现读写分离。注册方法写的时候直接写内存注册表,写完表之后主动失效读写缓存。获取注册信息接口先从只读缓存取,只读缓存没有再去读写缓存取,读写缓存没有再去内存注册表里取(不只是取,此处较复杂)。并且,读写缓存会更新回写只读缓存。此部分代码,请大家自行阅读并分析。
至此,我们已经将eureka中大部分流程已经分析完毕,当然还遗留了很多问题,比如节点数据同步就分全量更新与增量更新,这部分代码请大家自行跟踪并分析吧!
欢迎关注我的公众号:程序员L札记