【Flume】【源码分析】flumeng的事务控制的原理详解【内存通道memory channel】

一开始我也是以为flume ng的事务控制是在sink端的,因为只看到那里有事务的使用,但是今天看了一下fluem的整个事务控制,我才后知后觉,特此写了这篇文章,望各位不吝指教。

先来一张图吧!!!


从图中可以看出,flume的事务控制在source端和sink端都有,具体的事务是依赖于通道的。这里将的事务和文件通道中的事务控制有个小区别【文件通道中的事务是记录在磁盘上】

1、获取事务

Transaction transaction = channel.getTransaction();
方法定义:

public Transaction getTransaction() {

    if (!initialized) {
      synchronized (this) {
        if (!initialized) {
          initialize();
          initialized = true;
        }
      }
    }

    BasicTransactionSemantics transaction = currentTransaction.get();
    if (transaction == null || transaction.getState().equals(
            BasicTransactionSemantics.State.CLOSED)) {
      transaction = createTransaction();
      currentTransaction.set(transaction);
    }
    return transaction;
  }
内部调用createTransaction();方法,具体定义如下【看内存通道】:

protected BasicTransactionSemantics createTransaction() {
    return new MemoryTransaction(transCapacity, channelCounter);
  }
public MemoryTransaction(int transCapacity, ChannelCounter counter) {
      putList = new LinkedBlockingDeque<Event>(transCapacity);
      takeList = new LinkedBlockingDeque<Event>(transCapacity);

      channelCounter = counter;
    }
事务初始化了三个变量,分别是: 事件放入列表【一次事务中可以放入的event数量】,事件取出列表【一次事务中可以取走的event数量】,通道监控度量数据
transCapacity就是我们配置的事务容量,也就是一次事务中最多可以容下多少个event

事务,以及事务中的变量都定义好了,下面就是事务中具体的方法定义了:

以下只讲述内存通道的相关方法定义:【关于文件通道的讲解】

1、doPut

putList放入一个event,代表一个event已经纳入到事务中了;这个put的操作肯定是由source端发起的,看个例子:【关于ExexSource的源码分析】

 for (Channel reqChannel : reqChannelQueue.keySet()) {
      Transaction tx = reqChannel.getTransaction();
      Preconditions.checkNotNull(tx, "Transaction object must not be null");
      try {
        tx.begin();

        List<Event> batch = reqChannelQueue.get(reqChannel);

        for (Event event : batch) {
          reqChannel.put(event);
        }

        tx.commit();
      } catch (Throwable t) {
        tx.rollback();
        if (t instanceof Error) {
          LOG.error("Error while writing to required channel: " +
              reqChannel, t);
          throw (Error) t;
        } else {
          throw new ChannelException("Unable to put batch on required " +
              "channel: " + reqChannel, t);
        }
      } finally {
        if (tx != null) {
          tx.close();
        }
      }
这是channelprocessor的方法,循环遍历reqChannelQueue这个map对象,对立面每个通道对应的批量event进行put操作,纳入事务的过程中。

event在放入eventQueue是通过list的add方法,所以放在列表尾部

2、doTake

内存通道中有个变量

queue——LinkedBlockingDeque

在通道初始化的时候,初始化该变量了

synchronized(queueLock) {
        queue = new LinkedBlockingDeque<Event>(capacity);
        queueRemaining = new Semaphore(capacity);
        queueStored = new Semaphore(0);
      }
分别记录的通道的总容量、剩余空闲容量、占用容量

take的时候首先从queue中取出队头event,前面第一步说了放入的时候是放在尾部,现在从头部取,保证先入先出;这里是从总体的通道容量中取出一个,还需要操作一下takeList,将其纳入事务中,takeList中也put一次

3、doCommit

 synchronized(queueLock) {
        if(puts > 0 ) {
          while(!putList.isEmpty()) {
            if(!queue.offer(putList.removeFirst())) {
              throw new RuntimeException("Queue add failed, this shouldn't be able to happen");
            }
          }
        }
        putList.clear();
        takeList.clear();
      }

offer——放队尾【从putList中取头一个放】,一直循环到putList取完了

这里commit都做完了,说明doTake肯定没问题的,所以takeList要清空了

4、doRollback

 synchronized(queueLock) {
        Preconditions.checkState(queue.remainingCapacity() >= takeList.size(), "Not enough space in memory channel " +
            "queue to rollback takes. This should never happen, please report");
        while(!takeList.isEmpty()) {
          queue.addFirst(takeList.removeLast());
        }
        putList.clear();
      }
回滚,肯定回滚的是sink写出失败的event,在takeList中,走到这个方法的时候,doCommit肯定是没做的,只有发生异常了,才会到回滚,所以takeList并未clear。

这里将takeList中的最后一个元素,循环取出放回通道队列的第一个【因为doTake会从头取,当然要把刚刚失败的event继续放到头部,下次继续操作这些event

同理,这里rollback了,说明putList肯定也没用了,清空,重新再放


无论是往通道中放event还是从通道中取event,都有一个超时控制

// this does not need to be in the critical section as it does not
      // modify the structure of the log or queue.
      if(!queueRemaining.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
        throw new ChannelFullException("The channel has reached it's capacity. "
            + "This might be the result of a sink on the channel having too "
            + "low of batch size, a downstream system running slower than "
            + "normal, or that the channel capacity is just too low. "
            + channelNameDescriptor);
      }
这个keepAlive就是我们配置的超时时间,超时则会抛出异常!


以上所有操作方法中,大家看到有一些变量XXXCounter的操作,这些都是监控度量的数据,详见【flume中度量监控的分析】


你可能感兴趣的:(Flume,transaction)