什么是退避算法:
In a single channel contention based medium access control (MAC) protocols, whenever more than one station or node tries to access the medium at the same instant of time, it leads to packet collisions. If the collided stations tries to access the channel again, the packets will collide as the nodes are synchrozied in time. So the nodes need to be displaced in time. To displace them temporally, a backoff algorithm is used (example binary exponential backoff (BEB)). For example, in BEB algorithm, whenever a node's transmission is involved in a collision with another node's transmission, both nodes will choose a random waiting time and wait for this amoiunt of time before attempting again. If they are not successful in this attempt, they double their contention window and choose a randoim waiting time before transmitting again. This process will be repeated for certain number of attempts. If the nodes are not successful in their transmission after this limit, the packets will be dropped from their queue.
大致意思是,在一个共享信道的情况下,当网络上的节点在发生冲突时,每个节点节点等待一定的时间后重新发送。在二进制指数退避算法中,等待时间随着以二为底的指数增长。如果重试失败,那么下次的等待时间将会是上次的等待时间二倍。如果重试次数大于最大重试次数,那么包将从包队列中去除。
我们认识了什么是退避算法之后,来看一下flume中对退避算法的应用。从退避算法的概念可知,该算法用在网络错误,重试的情况中,例如打开一个网络链接,向网络中发送数据等。在flume中,insistentAppend和insistentOpen封装器都用到了退避算法来处理网络的发送数据和链接打开过程。我们来通过insistentAppend中的append方法例子,看一下怎么对退避算法进行运用。
public void append(Event evt) throws IOException, InterruptedException { List<IOException> exns = new ArrayList<IOException>(); int attemptRetries = 0; appendRequests++; while (!backoff.isFailed() && isOpen.get() && !Thread.currentThread().isInterrupted()) { try { appendAttempts++; super.append(evt); appendSuccesses++; backoff.reset(); // reset backoff counter; return; } catch (InterruptedException ie) { throw ie; } catch (IOException e) { // this is an unexpected exception long waitTime = backoff.sleepIncrement(); LOG.info("append attempt " + attemptRetries + " failed, backoff (" + waitTime + "ms): " + e.getMessage()); LOG.debug(e.getMessage(), e); exns.add((e instanceof IOException) ? (IOException) e : new IOException(e)); backoff.backoff(); try { backoff.waitUntilRetryOk(); } catch (InterruptedException e1) { // got an interrupted signal, bail out! throw e1; } finally { attemptRetries++; appendRetries++; } } catch (RuntimeException e) { // this is an unexpected exception LOG.info("Failed due to unexpected runtime exception " + "during append attempt", e); appendGiveups++; throw e; } } appendGiveups++; // failed to start IOException ioe = MultipleIOException.createIOException(exns); if (ioe == null) { return; } throw ioe; }
通过对以上代码抽象,一般采用以下形式来运用backoff算法。
while (!backoff.isFailed()) { try { doSomething(); //do something backoff.reset(); // reset backoff counter; return; } catch (Exception e) { backoff.backoff(); try { backoff.waitUntilRetryOk(); } catch (InterruptedException e1) { } } }
目前在flume中主要运用了ExponentialBackoff,CappedExponentialBackoff,CumulativeCappedExponentialBackoff三种退避算法。
ExponentialBackOff是个简单的指数退避算法,仅仅让下次的等待时间是上次等待时间的2倍,当重试次数达到最大重试次数时,该任务将不能重试。
CappedExponentialBackoff对ExponentialBackOff算法作了简单的改造,该算法对每次的等待时间做了个限定,即每次的等待时间不超过某个值sleepCap。但该方法没有限定重试次数。
CumulativeCappedExponentialBackoff算法对CappedExponentialBackoff作了些改造,该算法加入了cumulativeCap变量,用来限制重试次数。在第一次backoff的时候设置failTime值为当前时间+cumulativeCap。是否可以重试由当前时间和failTime决定。当前时间小于failTime则表明还可以重试,否则,不能重试。
通过对以上的分析,可以得到一个Backoff算法必须提供四个接口(isFailed,backOff,waitUntilRetryOk,reset)。其中,isFailed用来判断是否可以重试,backoff用来设置等待时间,waitUntilRetryOk根据backoff设置的等待时间sleep,以便下次重试。reset的接口是在任务成功后,对backoff算法的一些变量重置。详细可以看ExponentialBackoff等源代码。
退避算法为我们在解决重试某项任务的时候,提供了一个比较好的等待思想。