属性名 | 默认 | 描述 |
---|---|---|
channels | – |
|
type | – | 组件名:exec |
command | – | 执行的命令 |
shell | – | 运行命令的外壳 |
restartThrottle | 10000 | 在尝试重启命令进程之前,sleep多长时间(单位:毫秒) |
restart | false | 如果执行命令挂掉,是否要重启命令进程。 |
logStdErr | false | 是否应该记录该命令的错误日志。 |
batchSize | 20 | 一次读取和发送到Channel的最大行数。 |
batchTimeout | 3000 | 如果buffer的大小还没有到达,花费多长时间(单位:毫秒)去等待 |
selector.type | replicating | 复制(replicating)或复用(multiplexing) |
selector.* |
|
依赖selector.type的值 |
interceptors | – | 用空格分开的拦截器列表。 |
interceptors.* |
|
|
ExecSource可以实时搜集数据,但是在Flume不运行或者Shell命令出错的情况下,数据将会丢失。例如:通过tail -F去获取Nginx的访问日志,如果Flume挂掉,Nginx访问日志继续导入到日志文件中,那么在Flume挂掉的这段时间中,新产生的日志Flume是无法获取到的,为了更好的可靠性保证,可以考虑使用Spooling Directory Source,拿实时获取Nginx访问日志来说,Spooling Directory Source虽然做不到实时,但是也可以通过日志文件的切分,做到准实时。
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure
a1.sources.r1.channels = c1
‘shell’配置被用来调用‘command’配置的命令 (例如:Bash 或 Powershell). ‘command’ is passed as an argument to ‘shell’ for execution. ‘command’命令shell脚本的功能,例如:wildcards(通配符), back ticks(返回标记), pipes(管道), loops(循环), conditionals(条件语句) 等等,如果没有配置‘shell’,那么‘command’ 将直接调用。‘shell’的值一般为: ‘/bin/sh -c’, ‘/bin/ksh -c’, ‘cmd /c’, ‘powershell -Command’, 等等。
a1.sources.tailsource-1.type = exec
a1.sources.tailsource-1.shell = /bin/bash -c
a1.sources.tailsource-1.command = for i in /path/*.txt; do cat $i; done
一,configure(Context context)方法,ExecSource该方法配置比较简单,参考上述表格即可。 二,start()方法
@Override
public void start() {
logger.info("Exec source starting with command:{}", command);
//线程池
executor = Executors.newSingleThreadExecutor();
//构建ExecRunnable线程对象,传入配置文件的参数
runner = new ExecRunnable(shell, command, getChannelProcessor(), sourceCounter,
restart, restartThrottle, logStderr, bufferCount, batchTimeout, charset);
// FIXME: Use a callback-like executor / future to signal us upon failure.
runnerFuture = executor.submit(runner);
/*
* NB: This comes at the end rather than the beginning of the method because
* it sets our state to running. We want to make sure the executor is alive
* and well first.
*/
//启动计数器
sourceCounter.start();
super.start();
logger.debug("Exec source started");
}
三,ExecRunnable:该类是Exec Source主要的实现类,继承了Runnable。下面我们看看他的run方法:
@Override
public void run() {
do {
String exitCode = "unknown";
BufferedReader reader = null;
String line = null;
final List eventList = new ArrayList();
timedFlushService = Executors.newSingleThreadScheduledExecutor(
new ThreadFactoryBuilder().setNameFormat(
"timedFlushExecService" +
Thread.currentThread().getId() + "-%d").build());
try {
if(shell != null) {
//如果有配置shell,则将shell通过"\\s+"转化为数组,再将该数组+command一起组成一个新的数组。
String[] commandArgs = formulateShellCommand(shell, command);
//调用可执行系统命令
process = Runtime.getRuntime().exec(commandArgs);
} else {
//将command通过"\\s+"转化为数组
String[] commandArgs = command.split("\\s+");
//调用可执行系统命令
process = new ProcessBuilder(commandArgs).start();
}
//将shell命令的输出结果作为输入流读到reader中,InputStreamReader是字节流通向字符流的桥梁,它使用指定的charset读取字
//节并将其解码为字符,每次调用read方法都会从底层输入流读取一个或多个字节。
reader = new BufferedReader(
new InputStreamReader(process.getInputStream(), charset));
// StderrLogger dies as soon as the input stream is invalid
//初始化错误日志线程,如果logStderr为false将不会打印日志。
StderrReader stderrReader = new StderrReader(new BufferedReader(
new InputStreamReader(process.getErrorStream(), charset)), logStderr);
stderrReader.setName("StderrReader-[" + command + "]");
stderrReader.setDaemon(true);
stderrReader.start();
//该定时任务每batchTimeout执行一次,单位是毫秒
future = timedFlushService.scheduleWithFixedDelay(new Runnable() {
@Override
public void run() {
try {
synchronized (eventList) {
//eventList不能为空且超时
if(!eventList.isEmpty() && timeout()) {
//执行flush
flushEventBatch(eventList);
}
}
} catch (Exception e) {
logger.error("Exception occured when processing event batch", e);
if(e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
}
}
},
batchTimeout, batchTimeout, TimeUnit.MILLISECONDS);
//通过流,按行读取
while ((line = reader.readLine()) != null) {
synchronized (eventList) {
sourceCounter.incrementEventReceivedCount();
eventList.add(EventBuilder.withBody(line.getBytes(charset)));
//Event大小超过batchSize,或者超时了,就flush到Channel
if(eventList.size() >= bufferCount || timeout()) {
flushEventBatch(eventList);
}
}
}
//字节流中已经没有数据后,执行flush
synchronized (eventList) {
if(!eventList.isEmpty()) {
flushEventBatch(eventList);
}
}
} catch (Exception e) {
logger.error("Failed while running command: " + command, e);
if(e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException ex) {
logger.error("Failed to close reader for exec source", ex);
}
}
//杀杀子进程
exitCode = String.valueOf(kill());
}
if(restart) {
logger.info("Restarting in {}ms, exit code {}", restartThrottle,
exitCode);
try {
//在重启命令进程之前,休眠多少长时间
Thread.sleep(restartThrottle);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
} else {
logger.info("Command [" + command + "] exited with " + exitCode);
}
} while(restart);
//restart配置是指如果shell命令挂掉的话,是否重启该命令的进程,默认是false,配置为true的话,就会将刚才的所有代码循环一遍
}
四,flushEventBatch方法
private void flushEventBatch(List eventList){
//批量处理Event
channelProcessor.processEventBatch(eventList);
//统计
sourceCounter.addToEventAcceptedCount(eventList.size());
//清楚list
eventList.clear();
//获取最后一次push到Channel的时间,已便于判断超时
lastPushToChannel = systemClock.currentTimeMillis();
}
以上就是ExecSource大致流程。