flume Source启动过程分析

组件框图

开始之前,先看下基本的组件框架图,熟悉了大致框架流程学习起来必然会更加轻松: 
flume Source启动过程分析_第1张图片

  1. 接收事件
  2. 根据配置选择对应的Source运行器(EventDrivenSourceRunner 和 PollableSourceRunner)
  3. 处理器处理事件(Load-Balancing Sink 和 Failover Sink 处理器)
  4. 将事件传递给拦截器链
  5. 将每个事件传递给Channel选择器
  6. 返回写入事件的Channel列表
  7. 将所有事件写入每个必需的Channel,只有一个事务被打开
  8. 可选Channel(配置可选Channel后不管其是否写入成功)

程序入口

flume从 Application.java 文件中的main方法开始运行,main方法开始就是对命令行参数进行解析,然后就是加载配置文件进一步调用相应方法。 
Flume-ng支持两种加载配置文件模式,一种是静态配置,也就是只加载一次配置文件;第二种是基于Guava EventBus发布订阅模式的动态配置,只要对配置文件做了更改即便服务已经运行也是会使得更改被识别,即就是动态加载。

   
   
   
   
  1. //hasOption方法:true if set, false if not
  2. boolean reload = !commandLine.hasOption("no-reload-conf");
  3. if (isZkConfigured) {
  4. //若是通过ZooKeeper配置的,则使用ZooKeeper参数启动,具体步骤和else中类似
  5. } else {
  6. if (reload) {
  7. EventBus eventBus = new EventBus(agentName + "-event-bus");
  8. //PollingPropertiesFileConfigurationProvider该类是一个轮询操作,每隔30秒会去检查conf配置文件
  9. PollingPropertiesFileConfigurationProvider configurationProvider =
  10. new PollingPropertiesFileConfigurationProvider(
  11. agentName, configurationFile, eventBus, 30);
  12. components.add(configurationProvider);
  13. application = new Application(components);
  14. eventBus.register(application);
  15. } else {
  16. //静态加载配置文件,只加载一次
  17. PropertiesFileConfigurationProvider configurationProvider =
  18. new PropertiesFileConfigurationProvider(agentName, configurationFile);
  19. application = new Application();
  20. application.handleConfigurationEvent(configurationProvider.getConfiguration());
  21. }
  22. }
  23. application.start();

启动flume时没有指定no-reload-conf(默认false)的话hasOption方法就返回false,因而reload为true。所以这下来就是创建PollingPropertiesFileConfigurationProvider对象动态加载配置文件。然后可以看到eventBus.register(application)语句,其作用就是将对象application注册在eventBus上,当配置文件发生变化,configurationProvider就会发布消息(发布者), EventBus就会调用application中带有@Subscribe注解的方法handleConfigurationEvent(订阅者)。下边看看具体的handleConfigurationEvent方法实现了什么功能?

   
   
   
   
  1. @Subscribe
  2. public synchronized void handleConfigurationEvent(MaterializedConfiguration conf) {
  3. stopAllComponents(); //停止组件,顺序 source->sink->channel
  4. startAllComponents(conf); //启动组件,顺序 channel->sink->source
  5. }

该方法主要是实现了组件的启停功能,在每次调用前都会先停止所有组件,然后在启动组件,包括source、channel、sink。现在是研究具体的启动过程,那就进入startAllComponents方法看看。 
调用 startAllComponents 方法,其中会按顺序启动channel、sink、source几个组件:

   
   
   
   
  1. private void startAllComponents(MaterializedConfiguration materializedConfiguration) {
  2. /*
  3. materializedConfiguration对象中存储如下内容:
  4. { sourceRunners:{r1=PollableSourceRunner: { source:Taildir source: { positionFile: /home/bjtianye1/flume-kafkachannel/position.json, skipToEnd: false, byteOffsetHeader: false, idleTimeout: 120000, writePosInterval: 3000 } counterGroup:{ name:null counters:{} } }}sinkRunners:{k3=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@11498436 counterGroup:{ name:null counters:{} } },
  5. k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@7186fe17 counterGroup:{ name:null counters:{} } }, k2=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5593d23b counterGroup:{ name:null counters:{} } }} channels:{c2=org.apache.flume.channel.kafka.KafkaChannel{name: c2}} }
  6. */
  7. logger.info("Starting new configuration:{}", materializedConfiguration);
  8. this.materializedConfiguration = materializedConfiguration;
  9. //启动channel
  10. for (Entry<String, Channel> entry :
  11. materializedConfiguration.getChannels().entrySet()) {
  12. try {
  13. logger.info("Starting Channel " + entry.getKey());
  14. supervisor.supervise(entry.getValue(),
  15. new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
  16. } catch (Exception e) {
  17. logger.error("Error while starting {}", entry.getValue(), e);
  18. }
  19. }
  20. /*
  21. * Wait for all channels to start.
  22. */
  23. for (Channel ch : materializedConfiguration.getChannels().values()) {
  24. while (ch.getLifecycleState() != LifecycleState.START
  25. && !supervisor.isComponentInErrorState(ch)) {
  26. try {
  27. logger.info("Waiting for channel: " + ch.getName() +
  28. " to start. Sleeping for 500 ms");
  29. Thread.sleep(500);
  30. } catch (InterruptedException e) {
  31. logger.error("Interrupted while waiting for channel to start.", e);
  32. Throwables.propagate(e);
  33. }
  34. }
  35. }
  36. //启动sink
  37. for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners().entrySet()) {
  38. try {
  39. logger.info("Starting Sink " + entry.getKey());
  40. supervisor.supervise(entry.getValue(),
  41. new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
  42. } catch (Exception e) {
  43. logger.error("Error while starting {}", entry.getValue(), e);
  44. }
  45. }
  46. //启动source
  47. for (Entry<String, SourceRunner> entry :
  48. materializedConfiguration.getSourceRunners().entrySet()) {
  49. try {
  50. logger.info("Starting Source " + entry.getKey());
  51. supervisor.supervise(entry.getValue(),
  52. new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
  53. } catch (Exception e) {
  54. logger.error("Error while starting {}", entry.getValue(), e);
  55. }
  56. }
  57. //加载监控
  58. this.loadMonitoring();
  59. }

先看下source基本过程,在 startAllComponents 方法中会调用对应source组件的 supervise 方法,在supervise 方法中会创建monitorRunnable线程,然后通过scheduleWithFixedDelay方法根据给定延迟定期运行monitorRunnable线程:

   
   
   
   
  1. //supervise方法用于对传入组件生命周期的管理
  2. public synchronized void supervise(LifecycleAware lifecycleAware,
  3. SupervisorPolicy policy, LifecycleState desiredState) {
  4. ......
  5. //MonitorRunnable是一个线程,会定期检查组件的状态
  6. MonitorRunnable monitorRunnable = new MonitorRunnable();
  7. monitorRunnable.lifecycleAware = lifecycleAware;
  8. monitorRunnable.supervisoree = process;
  9. monitorRunnable.monitorService = monitorService;
  10. supervisedProcesses.put(lifecycleAware, process);
  11. /*
  12. * scheduleWithFixedDelay创建并执行一个在给定初始延迟后首次启用的定期操作,随后,
  13. * 在每一次执行终止和下一次执行开始之间都存在给定的延迟。
  14. */
  15. ScheduledFuture future = monitorService.scheduleWithFixedDelay(
  16. monitorRunnable, 0, 3, TimeUnit.SECONDS);
  17. monitorFutures.put(lifecycleAware, future);
  18. }

在monitorRunnable线程中的run方法中会根据指定的生命周期中的一种状态关键字选择相应的操作。 
定义生命周期中的四种状态的枚举类型如下所示:

   
   
   
   
  1. //flume-ng-core\src\main\java\org\apache\flume\lifecycle\LifecycleState.java
  2. public enum LifecycleState {
  3. IDLE, START, STOP, ERROR;
  4. public static final LifecycleState[] START_OR_ERROR = new LifecycleState[] {
  5. START, ERROR };
  6. public static final LifecycleState[] STOP_OR_ERROR = new LifecycleState[] {
  7. STOP, ERROR };
  8. }

现在是启动source,肯定就是传入的START,看看run方法:

   
   
   
   
  1. public void run() {
  2. logger.debug("checking process:{} supervisoree:{}", lifecycleAware,
  3. supervisoree);
  4. long now = System.currentTimeMillis();
  5. ......
  6. switch (supervisoree.status.desiredState) {
  7. case START:
  8. try {
  9. lifecycleAware.start(); //传入状态为START,就会调用该方法
  10. } catch (Throwable e) {
  11. ......
  12. }
  13. break;
  14. case STOP:
  15. try {
  16. lifecycleAware.stop();
  17. } catch (Throwable e) {
  18. ......
  19. }
  20. break;
  21. default:
  22. logger.warn("I refuse to acknowledge {} as a desired state",
  23. supervisoree.status.desiredState);
  24. }
  25. if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {
  26. logger.error(
  27. "Policy {} of {} has been violated - supervisor should exit!",
  28. supervisoree.policy, lifecycleAware);
  29. }
  30. }
  31. }
  32. } catch (Throwable t) {
  33. logger.error("Unexpected error", t);
  34. }
  35. logger.debug("Status check complete");
  36. }

会调用lifecycleAware.start()方法。到这块后,想要继续往下深究,我们就需要了解一个Source中称为Source运行器的组件。

SourceRunner运行器

SourceRunner运行器主要用于控制一个Source如何被驱动,目前Source提供了两种机制: PollableSource(轮询拉取)和EventDrivenSource(事件驱动)。PollableSource相关类需要外部驱动来确定source中是否有消息可以使用,而EventDrivenSource相关类不需要外部驱动,自己实现了事件驱动机制。在SourceRunner.java文件中,会根据instanceof 运算符(该运算符是用来在运行时指出对象是否是特定类的一个实例)来确定具体的source实现了哪种机制,然后创建相应的对象。

   
   
   
   
  1. //SourceRunner.java
  2. if (source instanceof PollableSource) {
  3. runner = new PollableSourceRunner();
  4. ((PollableSourceRunner) runner).setSource((PollableSource) source);
  5. } else if (source instanceof EventDrivenSource) {
  6. runner = new EventDrivenSourceRunner();
  7. ((EventDrivenSourceRunner) runner).setSource((EventDrivenSource) source);
  8. }

如代码所示会创建具体运行器的实例对象,至此也许有人会问,那这是在那块调用的呢?仔细查看代码从main方法开始,就可以发现在加载配置文件(在main方法中加载配置文件时已经根据配置文件创建了对应SourceRunner的实例对象,configurationProvider.getConfiguration()这步时)时就已经实例化了该对象。那么刚提到的两种驱动方式,都分别对应哪些具体的Source实现呢?看下图可知: 

  flume Source启动过程分析_第2张图片
flume-1.7版本中只有TaildirSource是PollableSource方式的。了解了Source运行器,接下来就可以看看TaildirSource的启动部分了。

TaildirSource启动

接上上步lifecycleAware.start()调用start方法,代码中那么多的start方法,那究竟接下来调用的是哪个start呢?看了SourceRunner运行器部分相信大家应该想到了,调用的肯定就是SourceRunner的start方法啦。eclipse查看目前Flume就存在EventDrivenSourceRunner和PollableSourceRunner两种方式,如下图: 

  flume Source启动过程分析_第3张图片
进入PollableSourceRunner的start方法:

   
   
   
   
  1. @Override
  2. public void start() {
  3. PollableSource source = (PollableSource) getSource();
  4. ChannelProcessor cp = source.getChannelProcessor();
  5. cp.initialize();
  6. source.start();
  7. runner = new PollingRunner();
  8. runner.source = source;
  9. runner.counterGroup = counterGroup;
  10. runner.shouldStop = shouldStop;
  11. runnerThread = new Thread(runner);
  12. runnerThread.setName(getClass().getSimpleName() + "-" +
  13. source.getClass().getSimpleName() + "-" + source.getName());
  14. runnerThread.start();
  15. lifecycleState = LifecycleState.START;
  16. }

start方法中调会创建一个PollingRunner的线程并启动该线程,线程的run方法中才会最终调用具体Source的process方法(如 TaildirSource):

   
   
   
   
  1. @Override
  2. public Status process() {
  3. Status status = Status.READY;
  4. try {
  5. existingInodes.clear();
  6. existingInodes.addAll(reader.updateTailFiles());
  7. for (long inode : existingInodes) {
  8. TailFile tf = reader.getTailFiles().get(inode);
  9. if (tf.needTail()) {
  10. tailFileProcess(tf, true);
  11. }
  12. }
  13. closeTailFiles();
  14. try {
  15. TimeUnit.MILLISECONDS.sleep(retryInterval);
  16. } catch (InterruptedException e) {
  17. logger.info("Interrupted while sleeping");
  18. }
  19. } catch (Throwable t) {
  20. logger.error("Unable to tail files", t);
  21. status = Status.BACKOFF;
  22. }
  23. return status;
  24. }

process方法是Source中最重要的一个方式,其中实现了事件写入Channel中的事务过程,具体的读写日志、位置记录文件读写等都实现在tailFileProcess方法中:

   
   
   
   
  1. //TaildirSource.java文件中的tailFileProcess方法中会将event写入channel
  2. private void tailFileProcess(TailFile tf, boolean backoffWithoutNL)
  3. throws IOException, InterruptedException {
  4. while (true) {
  5. reader.setCurrentFile(tf);
  6. List<Event> events = reader.readEvents(batchSize, backoffWithoutNL);
  7. if (events.isEmpty()) {
  8. break;
  9. }
  10. sourceCounter.addToEventReceivedCount(events.size());
  11. sourceCounter.incrementAppendBatchReceivedCount();
  12. try {
  13. //processEventBatch方法尝试将event批量放入配置的channel中
  14. getChannelProcessor().processEventBatch(events);
  15. reader.commit();
  16. } catch (ChannelException ex) {
  17. logger.warn("The channel is full or unexpected failure. " +
  18. "The source will try again after " + retryInterval + " ms");
  19. TimeUnit.MILLISECONDS.sleep(retryInterval);
  20. retryInterval = retryInterval << 1;
  21. retryInterval = Math.min(retryInterval, maxRetryInterval);
  22. continue;
  23. }
  24. retryInterval = 1000;
  25. sourceCounter.addToEventAcceptedCount(events.size());
  26. sourceCounter.incrementAppendBatchAcceptedCount();
  27. if (events.size() < batchSize) {
  28. break;
  29. }
  30. }
  31. }

在发送到channel的过程中我们也发现都会有事务的创建(getTransaction())、开始(tx.begin())、提交(tx.commit())、回滚(tx.rollback())、关闭(tx.close())等操作,这是必须的。在sink中这些操作基本都是在process方法中直接显式调用,而在source端则封装在processEvent和processEventBatch(批量写入channel)方法中。 
processEventBatch方法的代码在ChannelProcessor.java文件中定义,其实现最终将event写入channel中,会创建事务保证数据完整性,也就是flume中特有的事务机制,具体代码如下:

   
   
   
   
  1. public void processEventBatch(List<Event> events) {
  2. Preconditions.checkNotNull(events, "Event list must not be null");
  3. events = interceptorChain.intercept(events); //拦截器处理,根据具体拦截器配置对event添加headers
  4. Map<Channel, List<Event>> reqChannelQueue =
  5. new LinkedHashMap<Channel, List<Event>>(); //需要的Channel及要发送至该Channel的event列表的LinkedHashMap对象
  6. Map<Channel, List<Event>> optChannelQueue =
  7. new LinkedHashMap<Channel, List<Event>>(); //可选的Channel及要发送至该Channel的event列表
  8. for (Event event : events) {
  9. List<Channel> reqChannels = selector.getRequiredChannels(event); //获取需要的Channel列表
  10. for (Channel ch : reqChannels) {
  11. List<Event> eventQueue = reqChannelQueue.get(ch);
  12. if (eventQueue == null) {
  13. eventQueue = new ArrayList<Event>();
  14. reqChannelQueue.put(ch, eventQueue);
  15. }
  16. eventQueue.add(event);
  17. }
  18. List<Channel> optChannels = selector.getOptionalChannels(event); //获取可选的Channel列表
  19. for (Channel ch : optChannels) {
  20. List<Event> eventQueue = optChannelQueue.get(ch);
  21. if (eventQueue == null) {
  22. eventQueue = new ArrayList<Event>();
  23. optChannelQueue.put(ch, eventQueue);
  24. }
  25. eventQueue.add(event);
  26. }
  27. }
  28. // Process required channels
  29. for (Channel reqChannel : reqChannelQueue.keySet()) {
  30. Transaction tx = reqChannel.getTransaction();
  31. Preconditions.checkNotNull(tx, "Transaction object must not be null");
  32. try {
  33. tx.begin(); //开始事务
  34. List<Event> batch = reqChannelQueue.get(reqChannel);
  35. for (Event event : batch) { //遍历event依次放入Channel中
  36. reqChannel.put(event);
  37. }
  38. tx.commit(); //提交事务
  39. } catch (Throwable t) {
  40. tx.rollback(); //发生异常,回滚事务
  41. if (t instanceof Error) {
  42. LOG.error("Error while writing to required channel: " + reqChannel, t);
  43. throw (Error) t;
  44. } else if (t instanceof ChannelException) {
  45. throw (ChannelException) t;
  46. } else {
  47. throw new ChannelException("Unable to put batch on required " +
  48. "channel: " + reqChannel, t);
  49. }
  50. } finally {
  51. if (tx != null) {
  52. tx.close();
  53. }
  54. }
  55. }
  56. ......
  57. }

至此,TaildirSource启动的一个具体过程就结束了,至于taildirsource具体的读写文件过程这里就不说了,想了解的可以阅读相关代码实现。


你可能感兴趣的:(big,data)