在《Eureka客户端——初始化》一篇中,我们知道,在DiscoveryClient对象的构造函数中,有两个地方会触发服务端的注册信息获取,即服务发现。我们这里接着上一篇中的内容,再深入学习Eureka客户端是如何进行服务发现的。
在DiscoveryClient对象的构造函数中,有两个地方会触发服务端的注册信息获取,一个是直接在构造函数中直接调用了fetchRegistry()方法,一个是在initScheduledTasks()方法中启动了一个定时刷新服务信息的任务。其实,在initScheduledTasks()方法中,最后也是通过fetchRegistry()方法实现了服务信息的获取,我们下面分析其实现过程。
在DiscoveryClient对象的构造函数中,会根据配置文件中fetchRegistry的配置项(默认为true),通过调用fetchRegistry()方法,实现Eureka服务端注册信息的拉去。当通过fetchRegistry()方法拉去失败后,会调用fetchRegistryFromBackup()方法,从缓存中获取注册信息,如果通过fetchRegistryFromBackup()方法也拉去信息失败,且要求必须在初始化时进行信息的拉去(通过shouldEnforceFetchRegistryAtInit配置,默认为false),则会抛出对应的异常。
//首先,判断是否允许拉去Eureka服务端的注册数据
if (clientConfig.shouldFetchRegistry()) {
try {
//获取Eureka服务端的注册数据
boolean primaryFetchRegistryResult = fetchRegistry(false);
if (!primaryFetchRegistryResult) {
logger.info("Initial registry fetch from primary servers failed");
}
boolean backupFetchRegistryResult = true;
//从备份中获取Eureka服务端的注册数据
if (!primaryFetchRegistryResult && !fetchRegistryFromBackup()) {
backupFetchRegistryResult = false;
logger.info("Initial registry fetch from backup servers failed");
}
//当没有获取到注册数据,且要求启动时进行初始化,这个时候就会跑出异常
if (!primaryFetchRegistryResult && !backupFetchRegistryResult && clientConfig.shouldEnforceFetchRegistryAtInit()) {
throw new IllegalStateException("Fetch registry error at startup. Initial fetch failed.");
}
} catch (Throwable th) {
logger.error("Fetch registry error at startup: {}", th.getMessage());
throw new IllegalStateException(th);
}
}
fetchRegistry()是获取服务信息的核心方法,在定时更新服务信息的逻辑中,也是通过调用该方法实现的,我们后续再进行分析,这里先学习fetchRegistryFromBackup()方法的实现逻辑。
当fetchRegistry()方法返回false的时候,即Eureka服务不可用或不可达时,选择从备份中获取服务信息,即调用fetchRegistryFromBackup()方法获取服务信息。目前,Spring Cloud只提供了一个BackupRegistry的空实现类NotImplementedRegistryImpl,即没有针对获取失败时,做有效处理,用户可以进行扩展。实现逻辑如下:
private boolean fetchRegistryFromBackup() {
try {
//newBackupRegistryInstance()方法,返回null,而且该方法已经被标注过期
@SuppressWarnings("deprecation")
BackupRegistry backupRegistryInstance = newBackupRegistryInstance();
//这里是真正获取BackupRegistry 对象的地方,其中backupRegistryProvider是在DiscoveryClient一个构造函数中进行初始化的。不过,Netflix目前只提供了一个NotImplementedRegistryImpl实现,即空实现。
if (null == backupRegistryInstance) { // backward compatibility with the old protected method, in case it is being used.
backupRegistryInstance = backupRegistryProvider.get();
}
//如果backupRegistryInstance不为空
if (null != backupRegistryInstance) {
Applications apps = null;
//如果远端Region不为null
if (isFetchingRemoteRegionRegistries()) {
//获取远端的Region字符串
String remoteRegionsStr = remoteRegionsToFetch.get();
if (null != remoteRegionsStr) {
//获取所有Region上的服务实例
apps = backupRegistryInstance.fetchRegistry(remoteRegionsStr.split(","));
}
} else {
//获取本地的服务实例
apps = backupRegistryInstance.fetchRegistry();
}
if (apps != null) {
//服务实例重新洗牌
final Applications applications = this.filterAndShuffle(apps);
//设置hashCode,用于验证一致性
applications.setAppsHashCode(applications.getReconcileHashCode());
//在变量localRegionApps中设置服务实例
localRegionApps.set(applications);
//输入日志
logTotalInstances();
logger.info("Fetched registry successfully from the backup");
return true;
}
} else {
logger.warn("No backup registry instance defined & unable to find any discovery servers.");
}
} catch (Throwable e) {
logger.warn("Cannot fetch applications from apps although backup registry was specified", e);
}
return false;
}
除了前面提到的在初始化时就会进行的服务发现,在构造函数的initScheduledTasks()方法中,也启动了一个定时刷新服务信息的任务,具体实现如下:
if (clientConfig.shouldFetchRegistry()) {
// 刷新间隔,默认30s
int registryFetchIntervalSeconds = clientConfig.getRegistryFetchIntervalSeconds();
//失败重试次数,默认10次
int expBackOffBound = clientConfig.getCacheRefreshExecutorExponentialBackOffBound();
//创建线程,用于服务发现和定时更新服务实例信息
cacheRefreshTask = new TimedSupervisorTask(
"cacheRefresh",
scheduler,
cacheRefreshExecutor,
registryFetchIntervalSeconds,
TimeUnit.SECONDS,
expBackOffBound,
new CacheRefreshThread()
);
//执行任务
scheduler.schedule(
cacheRefreshTask,
registryFetchIntervalSeconds, TimeUnit.SECONDS);
}
根据上述代码,我们知道,其实真正实现服务发现的逻辑是在CacheRefreshThread中实现,其他主要是实现了该线程的定时执行,我们下面就开始学习CacheRefreshThread类的实现。
CacheRefreshThread类是DiscoveryClient类的内部类,实现了Runnable接口,所以每次定时任务执行时,都会会执行run()方法,而run()方法中,又调用了定义在DiscoveryClient类中的refreshRegistry()方法,如下:
class CacheRefreshThread implements Runnable {
public void run() {
refreshRegistry();
}
}
在refreshRegistry()方法中,又通过调用fetchRegistry()方法获取服务端注册的服务信息,fetchRegistry()方法在初始化的时候默认会执行一次,这里主要是用来定时刷新服务信息的。
@VisibleForTesting
void refreshRegistry() {
try {
//根据remoteRegionsToFetch变量是否有值,判断是否允许从其他region获取注册信息
boolean isFetchingRemoteRegionRegistries = isFetchingRemoteRegionRegistries();
boolean remoteRegionsModified = false;
// region列表,通过fetchRemoteRegionsRegistry进行配置
String latestRemoteRegions = clientConfig.fetchRegistryForRemoteRegions();
//处理region相关内容(更新等)
if (null != latestRemoteRegions) {
String currentRemoteRegions = remoteRegionsToFetch.get();
if (!latestRemoteRegions.equals(currentRemoteRegions)) {
// Both remoteRegionsToFetch and AzToRegionMapper.regionsToFetch need to be in sync
synchronized (instanceRegionChecker.getAzToRegionMapper()) {
if (remoteRegionsToFetch.compareAndSet(currentRemoteRegions, latestRemoteRegions)) {
String[] remoteRegions = latestRemoteRegions.split(",");
remoteRegionsRef.set(remoteRegions);
instanceRegionChecker.getAzToRegionMapper().setRegionsToFetch(remoteRegions);
remoteRegionsModified = true;
} else {
logger.info("Remote regions to fetch modified concurrently," +
" ignoring change from {} to {}", currentRemoteRegions, latestRemoteRegions);
}
}
} else {
// Just refresh mapping to reflect any DNS/Property change
instanceRegionChecker.getAzToRegionMapper().refreshMapping();
}
}
//调用fetchRegistry()方法获取Eureka服务端的注册信息,该方法前面已经分析和学习过了。
boolean success = fetchRegistry(remoteRegionsModified);
if (success) {
registrySize = localRegionApps.get().size();
lastSuccessfulRegistryFetchTimestamp = System.currentTimeMillis();
}
//打印日志 省略……
}
fetchRegistry()方法是获取服务信息的核心方法,我们单独一小节来学习该方法实现的逻辑。
在fetchRegistry()方法中,主要是根据条件判断使用全量更新还是增量更新服务信息,然后更新完成后,再比对appsHashCode,用来保证信息的一致性。完成服务信息同步后,在触发缓存刷新事件,刷新本地的缓存,最后再基于缓存数据进行更新Eureka服务端对应服务实例的状态,需要在更新缓存后执行。
private boolean fetchRegistry(boolean forceFullRegistryFetch) {
Stopwatch tracer = FETCH_REGISTRY_TIMER.start();
try {
// 获取Applications 对象(当增量拉去被禁用或者第一次的时候,会获取全部的服务实例),该对象在localRegionApps变量中存储,改变量又会在调用全量更新方法getAndStoreFullRegistry()进行赋值。
Applications applications = getApplications();
//根据判断条件,判断是采用全量更新还是增量更新
if (clientConfig.shouldDisableDelta()
|| (!Strings.isNullOrEmpty(clientConfig.getRegistryRefreshSingleVipAddress()))
|| forceFullRegistryFetch
|| (applications == null)
|| (applications.getRegisteredApplications().size() == 0)
|| (applications.getVersion() == -1)) //Client application does not have latest library supporting delta
{
logger.info("Disable delta property : {}", clientConfig.shouldDisableDelta());
logger.info("Single vip registry refresh property : {}", clientConfig.getRegistryRefreshSingleVipAddress());
logger.info("Force full registry fetch : {}", forceFullRegistryFetch);
logger.info("Application is null : {}", (applications == null));
logger.info("Registered Applications size is zero : {}",
(applications.getRegisteredApplications().size() == 0));
logger.info("Application version is -1: {}", (applications.getVersion() == -1));
//全量更新的方法
getAndStoreFullRegistry();
} else {
//增量更新的方法
getAndUpdateDelta(applications);
}
applications.setAppsHashCode(applications.getReconcileHashCode());
//记录当前客户端有多少个服务实例
logTotalInstances();
} catch (Throwable e) {
logger.error(PREFIX + "{} - was unable to refresh its cache! status = {}", appPathIdentifier, e.getMessage(), e);
return false;
} finally {
if (tracer != null) {
tracer.stop();
}
}
// 在更新远程服务状态时,进行缓存刷新,通过触发事件方式实现:fireEvent(new CacheRefreshedEvent());
onCacheRefreshed();
//基于缓存数据进行更新远程服务状态,需要在更新缓存后执行
updateInstanceRemoteStatus();
// registry was fetched successfully, so return true
return true;
}
getAndStoreFullRegistry()方法,用来获取Eureka服务端的全量数据(Applications对象),并存储到本地的localRegionApps变量中。
private void getAndStoreFullRegistry() throws Throwable {
//数据拉去的版本记录
long currentUpdateGeneration = fetchRegistryGeneration.get();
logger.info("Getting all instance registry info from the eureka server");
//根据registryRefreshSingleVipAddress配置(默认为空),默认调用了AbstractJerseyEurekaHttpClient的getApplicationsInternal()方法
Applications apps = null;
EurekaHttpResponse<Applications> httpResponse = clientConfig.getRegistryRefreshSingleVipAddress() == null
? eurekaTransport.queryClient.getApplications(remoteRegionsRef.get())
: eurekaTransport.queryClient.getVip(clientConfig.getRegistryRefreshSingleVipAddress(), remoteRegionsRef.get());
if (httpResponse.getStatusCode() == Status.OK.getStatusCode()) {
apps = httpResponse.getEntity();
}
logger.info("The response status is {}", httpResponse.getStatusCode());
if (apps == null) {
logger.error("The application is null for some reason. Not storing this information");
} else if (fetchRegistryGeneration.compareAndSet(currentUpdateGeneration, currentUpdateGeneration + 1)) {
//设置apps对象到localRegionApps变量(本地缓存)中
localRegionApps.set(this.filterAndShuffle(apps));
logger.debug("Got full registry with apps hashcode {}", apps.getAppsHashCode());
} else {
logger.warn("Not updating applications as another thread is updating it already");
}
}
其中,getApplications()方法,是在AbstractJerseyEurekaHttpClient类中的实现,如下:
@Override
public EurekaHttpResponse<Applications> getApplications(String... regions) {
return getApplicationsInternal("apps/", regions);
}
在getApplications()方法中,又调用了AbstractJerseyEurekaHttpClient类的getApplicationsInternal()方法,实现如下,该方法主要是实现了基于jersey构建的restful请求。
private EurekaHttpResponse<Applications> getApplicationsInternal(String urlPath, String[] regions) {
ClientResponse response = null;
String regionsParamValue = null;
try {
WebResource webResource = jerseyClient.resource(serviceUrl).path(urlPath);
if (regions != null && regions.length > 0) {
regionsParamValue = StringUtil.join(regions);
webResource = webResource.queryParam("regions", regionsParamValue);
}
Builder requestBuilder = webResource.getRequestBuilder();
addExtraHeaders(requestBuilder);
response = requestBuilder.accept(MediaType.APPLICATION_JSON_TYPE).get(ClientResponse.class);
Applications applications = null;
if (response.getStatus() == Status.OK.getStatusCode() && response.hasEntity()) {
applications = response.getEntity(Applications.class);
}
return anEurekaHttpResponse(response.getStatus(), Applications.class)
.headers(headersOf(response))
.entity(applications)
.build();
} finally {
if (logger.isDebugEnabled()) {
logger.debug("Jersey HTTP GET {}/{}?{}; statusCode={}",
serviceUrl, urlPath,
regionsParamValue == null ? "" : "regions=" + regionsParamValue,
response == null ? "N/A" : response.getStatus()
);
}
if (response != null) {
response.close();
}
}
}
getAndUpdateDelta()方法,用来获取Eureka服务端的增量数据。
在该方法中,首先通过AbstractJerseyEurekaHttpClient类的getDelta()方法获取增量数据
然后,判断返回结果,如果结果为空,则调用getAndStoreFullRegistry()方法获取全量数据,否则进行增量更新。
&esmp;增量更新时,首先调用updateDelta()方法处理数据,然后再获取更新后,本地数据的hashCode
&esmp;最后,比较hashCode,如果不一致,则调用reconcileAndLogDifference()方法进行处理。
private void getAndUpdateDelta(Applications applications) throws Throwable {
long currentUpdateGeneration = fetchRegistryGeneration.get();
Applications delta = null;
EurekaHttpResponse<Applications> httpResponse = eurekaTransport.queryClient.getDelta(remoteRegionsRef.get());
if (httpResponse.getStatusCode() == Status.OK.getStatusCode()) {
delta = httpResponse.getEntity();
}
if (delta == null) {//为null时,获取全量数据
logger.warn("The server does not allow the delta revision to be applied because it is not safe. "
+ "Hence got the full registry.");
getAndStoreFullRegistry();
} else if (fetchRegistryGeneration.compareAndSet(currentUpdateGeneration, currentUpdateGeneration + 1)) {//保证赋值的原子性
logger.debug("Got delta update with apps hashcode {}", delta.getAppsHashCode());
String reconcileHashCode = "";
if (fetchRegistryUpdateLock.tryLock()) {//尝试获取锁
try {
//处理更新数据
updateDelta(delta);
//获取更新后,本地数据的hashCode,用于验证数据的一致性
reconcileHashCode = getReconcileHashCode(applications);
} finally {
//释放锁
fetchRegistryUpdateLock.unlock();
}
} else {
logger.warn("Cannot acquire update lock, aborting getAndUpdateDelta");
}
// 如果数据不一致,调用reconcileAndLogDifference()方法处理
if (!reconcileHashCode.equals(delta.getAppsHashCode()) || clientConfig.shouldLogDeltaDiff()) {
reconcileAndLogDifference(delta, reconcileHashCode); // this makes a remoteCall
}
} else {
logger.warn("Not updating application delta as another thread is updating it already");
logger.debug("Ignoring delta update with apps hashcode {}, as another thread is updating it already", delta.getAppsHashCode());
}
}
getDelta()方法,是在AbstractJerseyEurekaHttpClient类中的实现,其中又调用了getApplicationsInternal()方法,而该方法跟前面调用全量更新的getApplicationsInternal()方法是同一个,只不过传入的urlPath参数不一样而已。
@Override
public EurekaHttpResponse<Applications> getDelta(String... regions) {
return getApplicationsInternal("apps/delta", regions);
}
updateDelta()方法主要用来把增量获取的服务实现,分门别类的添加到或移除出applications 对象,然后把applications 对象中的服务实例进行重写洗牌,保证每次返回顺序不一样。
private void updateDelta(Applications delta) {
int deltaCount = 0;
//遍历所有region中包含的所有InstanceInfo实例
for (Application app : delta.getRegisteredApplications()) {
for (InstanceInfo instance : app.getInstances()) {
//获取本地缓存中现有的数据
Applications applications = getApplications();
//获取当前实例所属的Region,并判断是否是本地Region,如果不是这获取该region对应的所有服务实例,并存储到remoteRegionVsApps变量中
String instanceRegion = instanceRegionChecker.getInstanceRegion(instance);
if (!instanceRegionChecker.isLocalRegion(instanceRegion)) {
Applications remoteApps = remoteRegionVsApps.get(instanceRegion);
if (null == remoteApps) {
remoteApps = new Applications();
remoteRegionVsApps.put(instanceRegion, remoteApps);
}
applications = remoteApps;
}
//增量更新计数+1
++deltaCount;
//新增操作
if (ActionType.ADDED.equals(instance.getActionType())) {
//根据appName,获取对应Application,如果不存在,直接把当前Application对象添加到applications对象的即可
Application existingApp = applications.getRegisteredApplications(instance.getAppName());
if (existingApp == null) {
applications.addApplication(app);
}
logger.debug("Added instance {} to the existing apps in region {}", instance.getId(), instanceRegion);
//获取对应的Application对象,并在其中添加instance实例
applications.getRegisteredApplications(instance.getAppName()).addInstance(instance);
} else if (ActionType.MODIFIED.equals(instance.getActionType())) {//和Add类似
Application existingApp = applications.getRegisteredApplications(instance.getAppName());
if (existingApp == null) {
applications.addApplication(app);
}
logger.debug("Modified instance {} to the existing apps ", instance.getId());
applications.getRegisteredApplications(instance.getAppName()).addInstance(instance);
} else if (ActionType.DELETED.equals(instance.getActionType())) {//删除时
//获取对应的Application 对象
Application existingApp = applications.getRegisteredApplications(instance.getAppName());
if (existingApp != null) {//不为空时,做删除
logger.debug("Deleted instance {} to the existing apps ", instance.getId());
//移除Application 对象中的instance实例
existingApp.removeInstance(instance);
//如果所有实例都不移除了,则直接把Application 对象也从applications中移除
if (existingApp.getInstancesAsIsFromEureka().isEmpty()) {
applications.removeApplication(existingApp);
}
}
}
}
}
logger.debug("The total number of instances fetched by the delta processor : {}", deltaCount);
//设置增量更新的版本
getApplications().setVersion(delta.getVersion());
//重新洗牌,确保每次返回的顺序不一致,方便后续随机选择服务实例
getApplications().shuffleInstances(clientConfig.shouldFilterOnlyUpInstances());
//当remoteRegionVsApps不为null时,处理
for (Applications applications : remoteRegionVsApps.values()) {
applications.setVersion(delta.getVersion());
applications.shuffleInstances(clientConfig.shouldFilterOnlyUpInstances());
}
}
该方法主要是为了获取数据一致性的AppsHashCode,本质就是根据获取来的实例的状态和个数信息,拼接成一个字符串。
getReconcileHashCode()方法首先通过,调用populateInstanceCountMap()方法处理本地applications对象和其他applications对象,最后再调用getReconcileHashCode()方法获取hashCode。
而populateInstanceCountMap()方法,通过遍历applications对象的Application对象,再遍历其中的服务实例,然后把服务实例的状态和序号存到了instanceCountMap中。
在getReconcileHashCode()方法中,就把服务实例的状态和需要拼接成了字符串,并返回。
private String getReconcileHashCode(Applications applications) {
TreeMap<String, AtomicInteger> instanceCountMap = new TreeMap<String, AtomicInteger>();
if (isFetchingRemoteRegionRegistries()) {
for (Applications remoteApp : remoteRegionVsApps.values()) {
remoteApp.populateInstanceCountMap(instanceCountMap);
}
}
applications.populateInstanceCountMap(instanceCountMap);
return Applications.getReconcileHashCode(instanceCountMap);
}
//Applications类
public void populateInstanceCountMap(Map<String, AtomicInteger> instanceCountMap) {
for (Application app : this.getRegisteredApplications()) {
for (InstanceInfo info : app.getInstancesAsIsFromEureka()) {
AtomicInteger instanceCount = instanceCountMap.computeIfAbsent(info.getStatus().name(),
k -> new AtomicInteger(0));
instanceCount.incrementAndGet();
}
}
}
public static String getReconcileHashCode(Map<String, AtomicInteger> instanceCountMap) {
StringBuilder reconcileHashCode = new StringBuilder(75);
for (Map.Entry<String, AtomicInteger> mapEntry : instanceCountMap.entrySet()) {
reconcileHashCode.append(mapEntry.getKey()).append(STATUS_DELIMITER).append(mapEntry.getValue().get())
.append(STATUS_DELIMITER);
}
return reconcileHashCode.toString();
}
该方法主要是重新去全量获取服务信息,然后设置到本地缓存变量localRegionApps中,并打印出hashCode的区别,具体实现如下:
private void reconcileAndLogDifference(Applications delta, String reconcileHashCode) throws Throwable {
logger.debug("The Reconcile hashcodes do not match, client : {}, server : {}. Getting the full registry",
reconcileHashCode, delta.getAppsHashCode());
RECONCILE_HASH_CODES_MISMATCH.increment();
long currentUpdateGeneration = fetchRegistryGeneration.get();
EurekaHttpResponse<Applications> httpResponse = clientConfig.getRegistryRefreshSingleVipAddress() == null
? eurekaTransport.queryClient.getApplications(remoteRegionsRef.get())
: eurekaTransport.queryClient.getVip(clientConfig.getRegistryRefreshSingleVipAddress(), remoteRegionsRef.get());
Applications serverApps = httpResponse.getEntity();
if (serverApps == null) {
logger.warn("Cannot fetch full registry from the server; reconciliation failure");
return;
}
if (fetchRegistryGeneration.compareAndSet(currentUpdateGeneration, currentUpdateGeneration + 1)) {
localRegionApps.set(this.filterAndShuffle(serverApps));
getApplications().setVersion(delta.getVersion());
logger.debug(
"The Reconcile hashcodes after complete sync up, client : {}, server : {}.",
getApplications().getReconcileHashCode(),
delta.getAppsHashCode());
} else {
logger.warn("Not setting the applications map as another thread has advanced the update generation");
}
}
onCacheRefreshed()方法主要是触发监听CacheRefreshedEvent事件的监听器。无论缓存是否变化,只要更新,都会触发该事件。
protected void onCacheRefreshed() {
fireEvent(new CacheRefreshedEvent());
}
protected void fireEvent(final EurekaEvent event) {
for (EurekaEventListener listener : eventListeners) {
try {
listener.onEvent(event);
} catch (Exception e) {
logger.info("Event {} throw an exception for listener {}", event, listener, e.getMessage());
}
}
}
该方法,主要实现服务端Instance服务实例状态的更新。
private synchronized void updateInstanceRemoteStatus() {
// 获取instanceInfo实例对象,在服务端的状态,并保存到currentRemoteInstanceStatus 变量
InstanceInfo.InstanceStatus currentRemoteInstanceStatus = null;
if (instanceInfo.getAppName() != null) {
Application app = getApplication(instanceInfo.getAppName());
if (app != null) {
InstanceInfo remoteInstanceInfo = app.getByInstanceId(instanceInfo.getId());
if (remoteInstanceInfo != null) {
currentRemoteInstanceStatus = remoteInstanceInfo.getStatus();
}
}
}
//如果没有获取到服务端的状态,则设置成UNKNOWN
if (currentRemoteInstanceStatus == null) {
currentRemoteInstanceStatus = InstanceInfo.InstanceStatus.UNKNOWN;
}
// 通过调用onRemoteStatusChanged()方法,触发StatusChangeEvent事件,通知服务端服务实例状态发送改变,并把当前状态设置到lastRemoteInstanceStatus 变量,记录为上次状态(下次对比使用)
if (lastRemoteInstanceStatus != currentRemoteInstanceStatus) {
onRemoteStatusChanged(lastRemoteInstanceStatus, currentRemoteInstanceStatus);
lastRemoteInstanceStatus = currentRemoteInstanceStatus;
}
}