ActiveMQ源码解析(三)Failover机制

在看完会话的源码后,最近项目提到了Failover机制的疑问,所以需要先读一下failover机制的源码,看看这个故障重连的机制是怎样的。
在建立连接的那篇文章中(见http://www.jianshu.com/p/d41f32ca22a5),讲到了FactoryFinder会根据schema找到对应的TransportFactory类,这个配置其实是放在transport目录中的。

private static final FactoryFinder TRANSPORT_FACTORY_FINDER = new FactoryFinder("META-INF/services/org/apache/activemq/transport/");

这种配置方法很值得学习。只需要写一个工厂类继承自TransportFactory,然后在"META-INF/services/org/apache/activemq/transport/"目录下放个配置文件,指向该工厂类就OK了。使用配置的方式新增,而不是在TansportFactory代码中去添加,更好的遵守了OCP原则。

Anyway,我们现在找到了FailoverTransportFactory,下一步是调用doConnect建立连接

    public Transport doConnect(URI location) throws IOException {
        try {
            //主要看这个
            Transport transport = createTransport(URISupport.parseComposite(location));
            //与Tcptransport类似,加上辅助功能
            transport = new MutexTransport(transport);
            transport = new ResponseCorrelator(transport);
            return transport;
        } catch (URISyntaxException e) {
            throw new IOException("Invalid location: " + location);
        }
    }

    public Transport createTransport(CompositeData compositData) throws IOException {
        Map options = compositData.getParameters();
        FailoverTransport transport = createTransport(options);
        if (!options.isEmpty()) {
            throw new IllegalArgumentException("Invalid connect parameters: " + options);
        }
        transport.add(false,compositData.getComponents());
        return transport;
    }

    public FailoverTransport createTransport(Map parameters) throws IOException {
        FailoverTransport transport = new FailoverTransport();
        Map nestedExtraQueryOptions = IntrospectionSupport.extractProperties(parameters, "nested.");
        IntrospectionSupport.setProperties(transport, parameters);
        try {
            transport.setNestedExtraQueryOptions(URISupport.createQueryString(nestedExtraQueryOptions));
        } catch (URISyntaxException e) {
        }
        return transport;
    }

调用了FailoverTransport的构造函数,建立了一个FailoverTransport,可以看到在FailoverTransport的方法里建立了一个名为reconnectTask的TaskRunner,跑的Task里的iterate()方法是重复执行的。具体机制见源码解析(二)。

public FailoverTransport() {
        brokerSslContext = SslContext.getCurrentSslContext();
        stateTracker.setTrackTransactions(true);
        // Setup a task that is used to reconnect the a connection async.
        reconnectTaskFactory = new TaskRunnerFactory();
        reconnectTaskFactory.init();
        reconnectTask = reconnectTaskFactory.createTaskRunner(new Task() {
            @Override
            public boolean iterate() {
                boolean result = false;
                //FailoverTransport被启动才执行该线程
                if (!started) {
                    return result;
                }
                boolean buildBackup = true;
                synchronized (backupMutex) {
                    //若连接未建立||需要重置集群负载||优先连接的服务器可用且该transport未被stop,则执行
                    if ((connectedTransport.get() == null || doRebalance || priorityBackupAvailable) && !disposed) {
                        result = doReconnect();
                        buildBackup = false;
                    }
                }
                //是否启用备用连接
                if (buildBackup) {
                    buildBackups();
                    if (priorityBackup && !connectedToPriority) {
                        try {
                            doDelay();
                            if (reconnectTask == null) {
                                return true;
                            }
                            reconnectTask.wakeup();
                        } catch (InterruptedException e) {
                            LOG.debug("Reconnect task has been interrupted.", e);
                        }
                    }
                } else {
                    // build backups on the next iteration
                    buildBackup = true;
                    try {
                        if (reconnectTask == null) {
                            return true;
                        }
                        reconnectTask.wakeup();
                    } catch (InterruptedException e) {
                        LOG.debug("Reconnect task has been interrupted.", e);
                    }
                }
                return result;
            }

        }, "ActiveMQ Failover Worker: " + System.identityHashCode(this));
    }

这个reconnectTask的线程会在FailoverTransport调用start()方法的时候开始启用,连接建立后变成wait,且在连接被异常断开时被唤醒。

可用看到上面的代码中最重要的是调用doReconnect()方法。这个方法主要有以下几个部分:
1. 第一部分是用来处理配置形式uris的,如果有配置,优先读取配置的uri,并添加到重连uris中本身不做重连。
2. 第二部分是做负载重连的,根据重连uris如果第一个已经对应的Transport在工作了则无需重连直接返回,否则去掉当前工作的Transport达到负载的目的。
3. 第三部分是关于Transport的备份机制的,如果设置了备份机制,且有Transport已经备份则取出该备份Transport返回。这里也没有设置备份。
可以看到前三部分都没有进行实际的重连工作,第四部分才是在上述三部分都不存在的情况下,进行实际的重连工作。

4. 第四部分才是重连工作,根据uri找到对应的Transport,对该Transport设置独有的TransportListener并启动。然后设置到FailoverTransport的connectedTransport作为当前连接的Transport。

下面把方法中主要的第四部分摘录出来

         Iterator iter = connectList.iterator();
         while ((transport != null || iter.hasNext()) && (connectedTransport.get() == null && !disposed)) {
             try {
                  // SSL先忽略
                  SslContext.setCurrentSslContext(brokerSslContext);

                  // We could be starting with a backup and if so we wait to grab a
                  // URI from the pool until next time around.
                  // 没建立过连接就先建立一个
                  if (transport == null) {
                      uri = addExtraQueryOptions(iter.next());
                      transport = TransportFactory.compositeConnect(uri);
                  }
                  LOG.debug("Attempting {}th connect to: {}", connectFailures, uri);

                  // 给transport增加一个transportListener,listener可以判断连接是否异常,异常时也唤醒reconnectTask
                  transport.setTransportListener(createTransportListener(transport));
                  // 如果transport的start方法调用没问题,则表示连接建立成功
                  transport.start();

                  // 如果不是第一次建立连接,则需要通知服务器是一个重连的客户端
                  if (started && !firstConnection) {
                      restoreTransport(transport);
                  }

                  LOG.debug("Connection established");
                  // 为下一次的重连进行配置
                  reconnectDelay = initialReconnectDelay;
                  connectedTransportURI = uri;
                  connectedTransport.set(transport);
                  connectedToPriority = isPriority(connectedTransportURI);
                  reconnectMutex.notifyAll(
                  connectFailures = 0;

                  // Make sure on initial startup, that the transportListener
                  // has been initialized for this instance.
                  // 确保为这个连接加上listener
                  synchronized (listenerMutex) {
                      if (transportListener == null) {
                            try {
                                 // if it isn't set after 2secs - it probably never will be
                                 listenerMutex.wait(2000);
                             } catch (InterruptedException ex) {
                             }
                       }
                  }
                  if (firstConnection) {
                     firstConnection = false;
                     LOG.info("Successfully connected to {}", uri);
                  } else {
                     LOG.info("Successfully reconnected to {}", uri);
                  }
                  return false;
           } catch (Exception e) {
               failure = e;
               LOG.debug("Connect fail to: {}, reason: {}", uri, e);
               if (transport != null) {
                  try {
                      transport.stop();
                      transport = null;
                  } catch (Exception ee) {
                       LOG.debug("Stop of failed transport: {} failed with reason: {}", transport, ee);
                  }
              }
         }

可以看到transport是通过这个语句建立的

transport = TransportFactory.compositeConnect(uri);

需要注意的是这个时候传入的uri是不带failover的,因此如果设置的是形如failover:(tcp://localhost:61616)这样的连接配置,此时uri应该是tcp://localhost:61616。所以返回的transport是tcpTransport。

看着transportListener很重要的样子,那这个transportListener到底起什么作用呢。

    private TransportListener createTransportListener(final Transport owner) {
        return new TransportListener() {
        ……
            @Override
            // 最主要的overide,当异常时调用handleTransportFailure方法
            public void onException(IOException error) {
                try {
                    handleTransportFailure(owner, error);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                    if (transportListener != null) {
                        transportListener.onException(new InterruptedIOException());
                    }
                }
            }
        ……
        };
    }

可以看到transportListener就是一个监控的线程,当发现transport异常时,会调用handleTransportFailure进行故障处理:

    public final void handleTransportFailure(Transport failed, IOException e) throws InterruptedException {
        // 如果异常是由于连接正在关闭引起的,就不用管它
        if (shuttingDown) {
            // shutdown info sent and remote socket closed and we see that before a local close
            // let the close do the work
            return;
        }

        if (LOG.isTraceEnabled()) {
            LOG.trace(this + " handleTransportFailure: " + e, e);
        }
        // 重置transport为空
        // could be blocked in write with the reconnectMutex held, but still needs to be whacked
        Transport transport = null;

        if (connectedTransport.compareAndSet(failed, null)) {
            transport = failed;
            if (transport != null) {
                disposeTransport(transport);
            }
        }

        synchronized (reconnectMutex) {
            if (transport != null && connectedTransport.get() == null) {
                boolean reconnectOk = false;

                // 如果transport是started状态且重连次数未达上限,则表示可以开始尝试重连
                if (canReconnect()) {
                    reconnectOk = true;
                }

                LOG.warn("Transport ({}) failed {} attempting to automatically reconnect: {}",
                         connectedTransportURI, (reconnectOk ? "," : ", not"), e);

                failedConnectTransportURI = connectedTransportURI;
                connectedTransportURI = null;
                connectedToPriority = false;

                if (reconnectOk) {
                    // notify before any reconnect attempt so ack state can be whacked
                    if (transportListener != null) {
                        transportListener.transportInterupted();
                    }

                    updated.remove(failedConnectTransportURI);
                    // 唤醒reconnectTask
                    reconnectTask.wakeup();
                } else if (!isDisposed()) {
                    propagateFailureToExceptionListener(e);
                }
            }
        }
    }

嗯,其实主要就是唤醒reconnectTask开始重连。。。

总结:FailoverTransport其实就是封装了一个reconnectTask,这个Task会在建立连接时启动,在连接异常时也会被唤醒,其主要就是对尝试建立连接的封装。

接下来还有个问题:
Transport什么时候会抛异常?因为环境中出现了部分异常客户端的连接处于CLOSE_WAIT的状态。且客户端没有再进行重连的尝试,其机制是怎样的呢?

带着这个疑问,后续再研究。

你可能感兴趣的:(ActiveMQ源码解析(三)Failover机制)