一开始我也是以为flume ng的事务控制是在sink端的,因为只看到那里有事务的使用,但是今天看了一下fluem的整个事务控制,我才后知后觉,特此写了这篇文章,望各位不吝指教。
先来一张图吧!!!
从图中可以看出,flume的事务控制在source端和sink端都有,具体的事务是依赖于通道的。这里将的事务和文件通道中的事务控制有个小区别【文件通道中的事务是记录在磁盘上】
Transaction transaction = channel.getTransaction();方法定义:
public Transaction getTransaction() { if (!initialized) { synchronized (this) { if (!initialized) { initialize(); initialized = true; } } } BasicTransactionSemantics transaction = currentTransaction.get(); if (transaction == null || transaction.getState().equals( BasicTransactionSemantics.State.CLOSED)) { transaction = createTransaction(); currentTransaction.set(transaction); } return transaction; }内部调用createTransaction();方法,具体定义如下【看内存通道】:
protected BasicTransactionSemantics createTransaction() { return new MemoryTransaction(transCapacity, channelCounter); }
public MemoryTransaction(int transCapacity, ChannelCounter counter) { putList = new LinkedBlockingDeque<Event>(transCapacity); takeList = new LinkedBlockingDeque<Event>(transCapacity); channelCounter = counter; }事务初始化了三个变量,分别是: 事件放入列表【一次事务中可以放入的event数量】,事件取出列表【一次事务中可以取走的event数量】,通道监控度量数据
事务,以及事务中的变量都定义好了,下面就是事务中具体的方法定义了:
以下只讲述内存通道的相关方法定义:【关于文件通道的讲解】
putList放入一个event,代表一个event已经纳入到事务中了;这个put的操作肯定是由source端发起的,看个例子:【关于ExexSource的源码分析】
for (Channel reqChannel : reqChannelQueue.keySet()) { Transaction tx = reqChannel.getTransaction(); Preconditions.checkNotNull(tx, "Transaction object must not be null"); try { tx.begin(); List<Event> batch = reqChannelQueue.get(reqChannel); for (Event event : batch) { reqChannel.put(event); } tx.commit(); } catch (Throwable t) { tx.rollback(); if (t instanceof Error) { LOG.error("Error while writing to required channel: " + reqChannel, t); throw (Error) t; } else { throw new ChannelException("Unable to put batch on required " + "channel: " + reqChannel, t); } } finally { if (tx != null) { tx.close(); } }这是channelprocessor的方法,循环遍历reqChannelQueue这个map对象,对立面每个通道对应的批量event进行put操作,纳入事务的过程中。
event在放入eventQueue是通过list的add方法,所以放在列表尾部
内存通道中有个变量
queue——LinkedBlockingDeque
在通道初始化的时候,初始化该变量了
synchronized(queueLock) { queue = new LinkedBlockingDeque<Event>(capacity); queueRemaining = new Semaphore(capacity); queueStored = new Semaphore(0); }分别记录的通道的总容量、剩余空闲容量、占用容量
take的时候首先从queue中取出队头event,前面第一步说了放入的时候是放在尾部,现在从头部取,保证先入先出;这里是从总体的通道容量中取出一个,还需要操作一下takeList,将其纳入事务中,takeList中也put一次
synchronized(queueLock) { if(puts > 0 ) { while(!putList.isEmpty()) { if(!queue.offer(putList.removeFirst())) { throw new RuntimeException("Queue add failed, this shouldn't be able to happen"); } } } putList.clear(); takeList.clear(); }
offer——放队尾【从putList中取头一个放】,一直循环到putList取完了
这里commit都做完了,说明doTake肯定没问题的,所以takeList要清空了
synchronized(queueLock) { Preconditions.checkState(queue.remainingCapacity() >= takeList.size(), "Not enough space in memory channel " + "queue to rollback takes. This should never happen, please report"); while(!takeList.isEmpty()) { queue.addFirst(takeList.removeLast()); } putList.clear(); }回滚,肯定回滚的是sink写出失败的event,在takeList中,走到这个方法的时候,doCommit肯定是没做的,只有发生异常了,才会到回滚,所以takeList并未clear。
这里将takeList中的最后一个元素,循环取出放回通道队列的第一个【因为doTake会从头取,当然要把刚刚失败的event继续放到头部,下次继续操作这些event】
同理,这里rollback了,说明putList肯定也没用了,清空,重新再放
无论是往通道中放event还是从通道中取event,都有一个超时控制
// this does not need to be in the critical section as it does not // modify the structure of the log or queue. if(!queueRemaining.tryAcquire(keepAlive, TimeUnit.SECONDS)) { throw new ChannelFullException("The channel has reached it's capacity. " + "This might be the result of a sink on the channel having too " + "low of batch size, a downstream system running slower than " + "normal, or that the channel capacity is just too low. " + channelNameDescriptor); }这个keepAlive就是我们配置的超时时间,超时则会抛出异常!
以上所有操作方法中,大家看到有一些变量XXXCounter的操作,这些都是监控度量的数据,详见【flume中度量监控的分析】