本文将介绍以下内容:
集群容错
在集群调用失败时,Dubbo 提供了多种容错方案,缺省为 failover 重试。
各节点关系:
Cluster接口
先来看看Cluster接口
@SPI(FailoverCluster.NAME)
public interface Cluster {
/**
* Merge the directory invokers to a virtual invoker.
*
* @param
* @param directory
* @return cluster invoker
* @throws RpcException
*/
@Adaptive
Invoker join(Directory directory) throws RpcException;
}
dubbo集群实现同样采用Dubbo SPI方式(在前一篇Filter剖析文章中已介绍,FailoverCluster.NAME定义默认集群模式FailoverCluster。join方法是集群具体实现方案的方法接口。
Failover Cluster
失败自动切换,当出现失败,重试其它服务器 [1]。通常用于读操作,但重试会带来更长延迟。可通过 retries=“2” 来设置重试次数(不含第一次)。
重试次数配置如下:
或
或
核心代码如下:
public Result doInvoke(Invocation invocation, final List> invokers, LoadBalance loadbalance) throws RpcException {
List> copyinvokers = invokers;
checkInvokers(copyinvokers, invocation);
String methodName = RpcUtils.getMethodName(invocation);
int len = getUrl().getMethodParameter(methodName, Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;
if (len <= 0) {
len = 1;
}
// retry loop.
RpcException le = null; // last exception.
List> invoked = new ArrayList>(copyinvokers.size()); // invoked invokers.
Set providers = new HashSet(len);
for (int i = 0; i < len; i++) {
//Reselect before retry to avoid a change of candidate `invokers`.
//NOTE: if `invokers` changed, then `invoked` also lose accuracy.
if (i > 0) {
checkWhetherDestroyed();
copyinvokers = list(invocation);
// check again
checkInvokers(copyinvokers, invocation);
}
Invoker invoker = select(loadbalance, invocation, copyinvokers, invoked);
invoked.add(invoker);
RpcContext.getContext().setInvokers((List) invoked);
try {
Result result = invoker.invoke(invocation);
if (le != null && logger.isWarnEnabled()) {
logger.warn("Although retry the method " + methodName
+ " in the service " + getInterface().getName()
+ " was successful by the provider " + invoker.getUrl().getAddress()
+ ", but there have been failed providers " + providers
+ " (" + providers.size() + "/" + copyinvokers.size()
+ ") from the registry " + directory.getUrl().getAddress()
+ " on the consumer " + NetUtils.getLocalHost()
+ " using the dubbo version " + Version.getVersion() + ". Last error is: "
+ le.getMessage(), le);
}
return result;
} catch (RpcException e) {
if (e.isBiz()) { // biz exception.
throw e;
}
le = e;
} catch (Throwable e) {
le = new RpcException(e.getMessage(), e);
} finally {
providers.add(invoker.getUrl().getAddress());
}
}
throw new RpcException(le.getCode(), "Failed to invoke the method "
+ methodName + " in the service " + getInterface().getName()
+ ". Tried " + len + " times of the providers " + providers
+ " (" + providers.size() + "/" + copyinvokers.size()
+ ") from the registry " + directory.getUrl().getAddress()
+ " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
+ Version.getVersion() + ". Last error is: "
+ le.getMessage(), le.getCause() != null ? le.getCause() : le);
}
Failfast Cluster
快速失败,只发起一次调用,失败立即报错。通常用于非幂等性的写操作,比如新增记录,核心代码实现如下:
public Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {
checkInvokers(invokers, invocation);
Invoker invoker = select(loadbalance, invocation, invokers, null);
try {
return invoker.invoke(invocation);
} catch (Throwable e) {
if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
throw (RpcException) e;
}
throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0,
"Failfast invoke providers " + invoker.getUrl() + " " + loadbalance.getClass().getSimpleName()
+ " select from all providers " + invokers + " for service " + getInterface().getName()
+ " method " + invocation.getMethodName() + " on consumer " + NetUtils.getLocalHost()
+ " use dubbo version " + Version.getVersion()
+ ", but no luck to perform the invocation. Last error is: " + e.getMessage(),
e.getCause() != null ? e.getCause() : e);
}
}
Failsafe Cluster
失败安全,出现异常时,直接忽略。通常用于写入审计日志等操作。
public Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {
try {
checkInvokers(invokers, invocation);
Invoker invoker = select(loadbalance, invocation, invokers, null);
return invoker.invoke(invocation);
} catch (Throwable e) {
logger.error("Failsafe ignore exception: " + e.getMessage(), e);
return new RpcResult(); // ignore
}
}
Failback Cluster
失败自动恢复,后台记录失败请求,定时重发。通常用于消息通知操作。
FailbackClusterInvoker中维护了一个ConcurrentMap类型的failed,key为Invocation,value为AbstractClusterInvoker。执行doInvoke方法时,若调用异常,则在捕获时将其put到failed中,然后启动定时任务遍历failed中所有失败调用并重试,若重试调用成功则将其从failed中剔除。
public class FailbackClusterInvoker extends AbstractClusterInvoker {
private static final Logger logger = LoggerFactory.getLogger(FailbackClusterInvoker.class);
private static final long RETRY_FAILED_PERIOD = 5 * 1000;
/**
* Use {@link NamedInternalThreadFactory} to produce {@link org.apache.dubbo.common.threadlocal.InternalThread}
* which with the use of {@link org.apache.dubbo.common.threadlocal.InternalThreadLocal} in {@link RpcContext}.
*/
private final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(2,
new NamedInternalThreadFactory("failback-cluster-timer", true));
private final ConcurrentMap> failed = new ConcurrentHashMap<>();
private volatile ScheduledFuture> retryFuture;
public FailbackClusterInvoker(Directory directory) {
super(directory);
}
private void addFailed(Invocation invocation, AbstractClusterInvoker> router) {
if (retryFuture == null) {
synchronized (this) {
if (retryFuture == null) {
retryFuture = scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {
@Override
public void run() {
// collect retry statistics
try {
retryFailed();
} catch (Throwable t) { // Defensive fault tolerance
logger.error("Unexpected error occur at collect statistic", t);
}
}
}, RETRY_FAILED_PERIOD, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
}
}
}
failed.put(invocation, router);
}
void retryFailed() {
if (failed.size() == 0) {
return;
}
for (Map.Entry> entry : new HashMap<>(failed).entrySet()) {
Invocation invocation = entry.getKey();
Invoker> invoker = entry.getValue();
try {
invoker.invoke(invocation);
failed.remove(invocation);
} catch (Throwable e) {
logger.error("Failed retry to invoke method " + invocation.getMethodName() + ", waiting again.", e);
}
}
}
@Override
protected Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {
try {
checkInvokers(invokers, invocation);
Invoker invoker = select(loadbalance, invocation, invokers, null);
return invoker.invoke(invocation);
} catch (Throwable e) {
logger.error("Failback to invoke method " + invocation.getMethodName() + ", wait for retry in background. Ignored exception: "
+ e.getMessage() + ", ", e);
addFailed(invocation, this);
return new RpcResult(); // ignore
}
}
}
Forking Cluster
并行调用多个服务器,只要一个成功即返回。通常用于实时性要求较高的读操作,但需要浪费更多服务资源。可通过 forks=“2” 来设置最大并行数。
public Result doInvoke(final Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {
try {
checkInvokers(invokers, invocation);
final List> selected;
final int forks = getUrl().getParameter(Constants.FORKS_KEY, Constants.DEFAULT_FORKS);
final int timeout = getUrl().getParameter(Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT);
if (forks <= 0 || forks >= invokers.size()) {
selected = invokers;
} else {
selected = new ArrayList<>();
for (int i = 0; i < forks; i++) {
// TODO. Add some comment here, refer chinese version for more details.
Invoker invoker = select(loadbalance, invocation, invokers, selected);
if (!selected.contains(invoker)) {
//Avoid add the same invoker several times.
selected.add(invoker);
}
}
}
RpcContext.getContext().setInvokers((List) selected);
final AtomicInteger count = new AtomicInteger();
final BlockingQueue
Broadcast Cluster
广播调用所有提供者,逐个调用,任意一台报错则报错 [2]。通常用于通知所有提供者更新缓存或日志等本地资源信息。
public Result doInvoke(final Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {
checkInvokers(invokers, invocation);
RpcContext.getContext().setInvokers((List) invokers);
RpcException exception = null;
Result result = null;
for (Invoker invoker : invokers) {
try {
result = invoker.invoke(invocation);
} catch (RpcException e) {
exception = e;
logger.warn(e.getMessage(), e);
} catch (Throwable e) {
exception = new RpcException(e.getMessage(), e);
logger.warn(e.getMessage(), e);
}
}
if (exception != null) {
throw exception;
}
return result;
}
前面我们学习用于写操作的FailfastCluster和用于读操作的FailoverCluster。这里我们结合了这两种集群对应用做了一个全局配置,即写操作使用FailfastCluster方式,读操作(其他操作)使用FailoverCluster方式。
新建BeiDaoDubboCluster
package com.beidao.dubbo.cluster;
import com.alibaba.dubbo.rpc.Invoker;
import com.alibaba.dubbo.rpc.RpcException;
import com.alibaba.dubbo.rpc.cluster.Cluster;
import com.alibaba.dubbo.rpc.cluster.Directory;
/**
* @author 0200759
* 自定义集群
*
*/
public class BeiDaoDubboCluster implements Cluster{
public final static String Name = "beidaoCluster";
public Invoker join(Directory directory) throws RpcException {
return new BeiDaoDubboClusterInvoker(directory);
}
}
新建BeiDaoDubboClusterInvoker
package com.beidao.dubbo.cluster;
import java.util.List;
import com.alibaba.dubbo.common.logger.Logger;
import com.alibaba.dubbo.common.logger.LoggerFactory;
import com.alibaba.dubbo.rpc.Invocation;
import com.alibaba.dubbo.rpc.Invoker;
import com.alibaba.dubbo.rpc.Result;
import com.alibaba.dubbo.rpc.RpcException;
import com.alibaba.dubbo.rpc.cluster.Directory;
import com.alibaba.dubbo.rpc.cluster.LoadBalance;
import com.alibaba.dubbo.rpc.cluster.support.AbstractClusterInvoker;
import com.alibaba.dubbo.rpc.cluster.support.FailfastClusterInvoker;
import com.alibaba.dubbo.rpc.cluster.support.FailoverClusterInvoker;
/**
* 集群策略(根据判定读写分配不同的策略)
* @author 0200759
*
*/
public class BeiDaoDubboClusterInvoker extends AbstractClusterInvoker{
//定义写操作方法前缀
private final static String[] WRITE_PREFFIX_ARRAY = new String[] { "SAVE", "ADD", "INSERT", "DEL", "UPDATE" };
private static final Logger logger = LoggerFactory.getLogger(BeiDaoDubboClusterInvoker.class);
private Directory directory;
public BeiDaoDubboClusterInvoker(Directory directory) {
super(directory);
this.directory = directory;
}
@Override
protected Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance)
throws RpcException {
String methodName = invocation.getMethodName().toUpperCase();
boolean write = checkMethod(methodName);
if(write){
logger.info(methodName + " method is excuting cluster for writing operation");
return new FailfastClusterInvoker(directory).doInvoke(invocation, invokers, loadbalance);
}else{
logger.info(methodName + " method is excuting cluster for reading operation");
return new FailoverClusterInvoker(directory).doInvoke(invocation, invokers, loadbalance);
}
}
/**
* 检查是否为写操作
* @param methodName
* @return
*/
private boolean checkMethod(String methodName) {
for(String writePreffix : WRITE_PREFFIX_ARRAY){
if(methodName.startsWith(writePreffix)){
return true;
}
}
return false;
}
}
在/resrouces/META-INF/dubbo/目录新建com.alibaba.dubbo.rpc.cluster.Cluster文件
beidaoCluster = com.beidao.dubbo.cluster.BeiDaoDubboCluster
在Consumer端配置cluster
Done!!!朋友,都看到这里了,点个赞呗。。。
源码地址:https://github.com/MAXAmbitious/dubbo-study/tree/master/dubbo-beidao-cluster