flume的source用于收集日志,父类为AbstractSource,下图中的其他类都继承于AbstractSource
AvroSource
lifecycleAware 会调用start方法启动avroSource。avroSource主要启动了一个NettyServer用于接收数据,然后交由avroSource处理。
@Override
public void start() {
logger.info("Starting {}...", this);
//when receive data, AvroSourceProtocol will parse this data, then call AvroSource process received data
Responder responder = new SpecificResponder(AvroSourceProtocol.class, this);
//will create NioServerSocketChannelFactory depend on maxThreads
NioServerSocketChannelFactory socketChannelFactory = initSocketChannelFactory();
//will create SSLCompressionChannelPipelineFactory or ChannelPipelineFactory
ChannelPipelineFactory pipelineFactory = initChannelPipelineFactory();
server = new NettyServer(responder, new InetSocketAddress(bindAddress, port),
socketChannelFactory, pipelineFactory, null);
connectionCountUpdater = Executors.newSingleThreadScheduledExecutor();
server.start();
sourceCounter.start();
super.start();
final NettyServer srv = (NettyServer)server;
connectionCountUpdater.scheduleWithFixedDelay(new Runnable(){
@Override
public void run() {
//用于监控connection count
sourceCounter.setOpenConnectionCount(
Long.valueOf(srv.getNumActiveConnections()));
}
}, 0, 60, TimeUnit.SECONDS);
logger.info("Avro source {} started.", getName());
}
当AvroSource接收到数据时,会调用append函数,append函数会调用getChannelProcessor().processEvent处理接收的event
@Override
public Status append(AvroFlumeEvent avroEvent) {
logger.debug("Avro source {}: Received avro event: {}", getName(),
avroEvent);
sourceCounter.incrementAppendReceivedCount();
sourceCounter.incrementEventReceivedCount();
Event event = EventBuilder.withBody(avroEvent.getBody().array(),
toStringMap(avroEvent.getHeaders()));
try {
调用 getChannelProcessor().processEvent处理接收的event
getChannelProcessor().processEvent(event);
} catch (ChannelException ex) {
logger.warn("Avro source " + getName() + ": Unable to process event. " +
"Exception follows.", ex);
return Status.FAILED;
}
sourceCounter.incrementAppendAcceptedCount();
sourceCounter.incrementEventAcceptedCount();
return Status.OK;
}
AvroLegacySource
AvroLegacySource implementation that receives Avro events from Avro sink of Flume OG
@Override
public void start() {
// setup http server to receive OG events
res = new SpecificResponder(FlumeOGEventAvroServer.class, this);
try {
http = new HttpServer(res, host, port);
} catch (IOException eI) {
LOG.warn("Failed to start server", eI);
return;
}
http.start();
super.start();
}
@Override
public Void append( AvroFlumeOGEvent evt ) throws AvroRemoteException {
counterGroup.incrementAndGet("rpc.received");
Map<String, String> headers = new HashMap<String, String>();
// extract Flume OG event headers
headers.put(HOST, evt.getHost().toString());
headers.put(TIMESTAMP, evt.getTimestamp().toString());
headers.put(PRIORITY, evt.getPriority().toString());
headers.put(NANOS, evt.getNanos().toString());
for (Entry<CharSequence, ByteBuffer> entry : evt.getFields().entrySet()) {
headers.put(entry.getKey().toString(), entry.getValue().toString());
}
headers.put(OG_EVENT, "yes");
Event event = EventBuilder.withBody(evt.getBody().array(), headers);
try {
getChannelProcessor().processEvent(event);
counterGroup.incrementAndGet("rpc.events");
} catch (ChannelException ex) {
return null;
}
counterGroup.incrementAndGet("rpc.successful");
return null;
}
EmbeddedSource
EmbeddedSource is simple source used to allow direct access to the channel for the Embedded Agent. There is a EmbeddedAgent class. When call EmbeddedAgent put event, EmbeddedAgent will call put method of EmbeddedSource, EmbeddedSource directory call processEvent function.
public class EmbeddedSource extends AbstractSource
implements EventDrivenSource, Configurable {
@Override
public void configure(Context context) {
}
public void put(Event event) throws ChannelException {
getChannelProcessor().processEvent(event);
}
public void putAll(List<Event> events) throws ChannelException {
getChannelProcessor().processEventBatch(events);
}
}
ExecSource
execsource启动了一个ExecRunnable用于执行command
public void start() {
logger.info("Exec source starting with command:{}", command);
executor = Executors.newSingleThreadExecutor();
runner = new ExecRunnable(shell, command, getChannelProcessor(), sourceCounter,
restart, restartThrottle, logStderr, bufferCount, batchTimeout, charset);
// FIXME: Use a callback-like executor / future to signal us upon failure.
runnerFuture = executor.submit(runner);
/*
* NB: This comes at the end rather than the beginning of the method because
* it sets our state to running. We want to make sure the executor is alive
* and well first.
*/
sourceCounter.start();
super.start();
logger.debug("Exec source started");
}
下面是ExecRunnable的run函数,先启动了一个定时任务用于定时刷新数据到channel,然后从process的input stream 读数据,并提交到channel,当process结束的时候,如果需要restart,会重启另一个处理进程。
public void run() {
do {
String exitCode = "unknown";
BufferedReader reader = null;
String line = null;
final List<Event> eventList = new ArrayList<Event>();
timedFlushService = Executors.newSingleThreadScheduledExecutor(
new ThreadFactoryBuilder().setNameFormat(
"timedFlushExecService" +
Thread.currentThread().getId() + "-%d").build());
try {
if(shell != null) {
String[] commandArgs = formulateShellCommand(shell, command);
process = Runtime.getRuntime().exec(commandArgs);
} else {
String[] commandArgs = command.split("\\s+");
process = new ProcessBuilder(commandArgs).start();
}
reader = new BufferedReader(
new InputStreamReader(process.getInputStream(), charset));
// StderrLogger dies as soon as the input stream is invalid
StderrReader stderrReader = new StderrReader(new BufferedReader(
new InputStreamReader(process.getErrorStream(), charset)), logStderr);
stderrReader.setName("StderrReader-[" + command + "]");
stderrReader.setDaemon(true);
stderrReader.start();
future = timedFlushService.scheduleWithFixedDelay(new Runnable() {
@Override
public void run() {
try {
synchronized (eventList) {
if(!eventList.isEmpty() && timeout()) {
flushEventBatch(eventList);
}
}
} catch (Exception e) {
logger.error("Exception occured when processing event batch", e);
if(e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
}
}
},
batchTimeout, batchTimeout, TimeUnit.MILLISECONDS);
while ((line = reader.readLine()) != null) {
synchronized (eventList) {
sourceCounter.incrementEventReceivedCount();
eventList.add(EventBuilder.withBody(line.getBytes(charset)));
if(eventList.size() >= bufferCount || timeout()) {
flushEventBatch(eventList);
}
}
}
synchronized (eventList) {
if(!eventList.isEmpty()) {
flushEventBatch(eventList);
}
}
} catch (Exception e) {
logger.error("Failed while running command: " + command, e);
if(e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException ex) {
logger.error("Failed to close reader for exec source", ex);
}
}
exitCode = String.valueOf(kill());
}
if(restart) {
logger.info("Restarting in {}ms, exit code {}", restartThrottle,
exitCode);
try {
Thread.sleep(restartThrottle);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
} else {
logger.info("Command [" + command + "] exited with " + exitCode);
}
} while(restart);
}
HTTPSource
httpsource which accepts Flume Events by HTTP POST and GET. GET should be used
for experimentation only. HTTP requests are converted into flume events by a
pluggable "handler" which must implement the {@linkplain HTTPSourceHandler} interface.
在start函数中启动了server并且调用FlumeHTTPServlet中的doPost方法来处理接收到的request
@Override
public void start() {
Preconditions.checkState(srv == null,
"Running HTTP Server found in source: " + getName()
+ " before I started one."
+ "Will not attempt to start.");
srv = new Server();
SocketConnector connector = new SocketConnector();
connector.setPort(port);
connector.setHost(host);
srv.setConnectors(new Connector[] { connector });
try {
org.mortbay.jetty.servlet.Context root =
new org.mortbay.jetty.servlet.Context(
srv, "/", org.mortbay.jetty.servlet.Context.SESSIONS);
root.addServlet(new ServletHolder(new FlumeHTTPServlet()), "/");
srv.start();
Preconditions.checkArgument(srv.getHandler().equals(root));
} catch (Exception ex) {
LOG.error("Error while starting HTTPSource. Exception follows.", ex);
Throwables.propagate(ex);
}
Preconditions.checkArgument(srv.isRunning());
sourceCounter.start();
super.start();
}
@Override
public void doPost(HttpServletRequest request, HttpServletResponse response)
throws IOException {
List<Event> events = Collections.emptyList(); //create empty list
try {
events = handler.getEvents(request);
} catch (HTTPBadRequestException ex) {
LOG.warn("Received bad request from client. ", ex);
response.sendError(HttpServletResponse.SC_BAD_REQUEST,
"Bad request from client. "
+ ex.getMessage());
return;
} catch (Exception ex) {
LOG.warn("Deserializer threw unexpected exception. ", ex);
response.sendError(HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
"Deserializer threw unexpected exception. "
+ ex.getMessage());
return;
}
sourceCounter.incrementAppendBatchReceivedCount();
sourceCounter.addToEventReceivedCount(events.size());
try {
getChannelProcessor().processEventBatch(events);
} catch (ChannelException ex) {
LOG.warn("Error appending event to channel. "
+ "Channel might be full. Consider increasing the channel "
+ "capacity or make sure the sinks perform faster.", ex);
response.sendError(HttpServletResponse.SC_SERVICE_UNAVAILABLE,
"Error appending event to channel. Channel might be full."
+ ex.getMessage());
return;
} catch (Exception ex) {
LOG.warn("Unexpected error appending event to channel. ", ex);
response.sendError(HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
"Unexpected error while appending event to channel. "
+ ex.getMessage());
return;
}
response.setCharacterEncoding(request.getCharacterEncoding());
response.setStatus(HttpServletResponse.SC_OK);
response.flushBuffer();
sourceCounter.incrementAppendBatchAcceptedCount();
sourceCounter.addToEventAcceptedCount(events.size());
}
MultiportSyslogTCPSource
MultiportSyslogTCPSource是一个多端口的
SyslogTCPSource,用于接收多个端口上的TCP消息。在start方法中启动了一个
acceptor,并且bind到多个端口,当有消息接收到时会调用
MultiportSyslogHandler的
messageReceived方法,将line String转换成event,提交到channel。(用mina来实现传输)
@Override
public void start() {
logger.info("Starting {}...", this);
// allow user to specify number of processors to use for thread pool
if (numProcessors != null) {
acceptor = new NioSocketAcceptor(numProcessors);
} else {
acceptor = new NioSocketAcceptor();
}
acceptor.setReuseAddress(true);
acceptor.getSessionConfig().setReadBufferSize(readBufferSize);
acceptor.getSessionConfig().setIdleTime(IdleStatus.BOTH_IDLE, 10);
acceptor.setHandler(new MultiportSyslogHandler(maxEventSize, batchSize,
getChannelProcessor(), sourceCounter, portHeader, defaultDecoder,
portCharsets));
for (int port : ports) {
InetSocketAddress addr;
if (host != null) {
addr = new InetSocketAddress(host, port);
} else {
addr = new InetSocketAddress(port);
}
try {
//Not using the one that takes an array because we won't want one bind
//error affecting the next.
acceptor.bind(addr);
} catch (IOException ex) {
logger.error("Could not bind to address: " + String.valueOf(addr), ex);
}
}
sourceCounter.start();
super.start();
logger.info("{} started.", this);
}
public void messageReceived(IoSession session, Object message) {
IoBuffer buf = (IoBuffer) message;
IoBuffer savedBuf = (IoBuffer) session.getAttribute(SAVED_BUF);
ParsedBuffer parsedLine = new ParsedBuffer();
List<Event> events = Lists.newArrayList();
// the character set can be specified per-port
CharsetDecoder decoder = defaultDecoder.get();
int port =
((InetSocketAddress) session.getLocalAddress()).getPort();
if (portCharsets.containsKey(port)) {
decoder = portCharsets.get(port).get();
}
// while the buffer is not empty
while (buf.hasRemaining()) {
events.clear();
// take number of events no greater than batchSize
for (int num = 0; num < batchSize && buf.hasRemaining(); num++) {
if (lineSplitter.parseLine(buf, savedBuf, parsedLine)) {
Event event = parseEvent(parsedLine, decoder);
if (portHeader != null) {
event.getHeaders().put(portHeader, String.valueOf(port));
}
events.add(event);
} else {
logger.trace("Parsed null event");
}
}
// don't try to write anything if we didn't get any events somehow
if (events.isEmpty()) {
logger.trace("Empty set!");
return;
}
int numEvents = events.size();
sourceCounter.addToEventReceivedCount(numEvents);
// write the events to the downstream channel
try {
channelProcessor.processEventBatch(events);
sourceCounter.addToEventAcceptedCount(numEvents);
} catch (Throwable t) {
logger.error("Error writing to channel, event dropped", t);
if (t instanceof Error) {
Throwables.propagate(t);
}
}
}
}
NetcatSource
NetcatSource open了一个ServerSocketChannel,用于接收client的链接,当接收到数据的时候调用NetcatSocketHandler的run函数来解析line string数据。(采用bio)
@Override
public void start() {
logger.info("Source starting");
counterGroup.incrementAndGet("open.attempts");
handlerService = Executors.newCachedThreadPool(new ThreadFactoryBuilder()
.setNameFormat("netcat-handler-%d").build());
try {
SocketAddress bindPoint = new InetSocketAddress(hostName, port);
serverSocket = ServerSocketChannel.open();
serverSocket.socket().setReuseAddress(true);
serverSocket.socket().bind(bindPoint);
logger.info("Created serverSocket:{}", serverSocket);
} catch (IOException e) {
counterGroup.incrementAndGet("open.errors");
logger.error("Unable to bind to socket. Exception follows.", e);
throw new FlumeException(e);
}
AcceptHandler acceptRunnable = new AcceptHandler(maxLineLength);
acceptThreadShouldStop.set(false);
acceptRunnable.counterGroup = counterGroup;
acceptRunnable.handlerService = handlerService;
acceptRunnable.shouldStop = acceptThreadShouldStop;
acceptRunnable.ackEveryEvent = ackEveryEvent;
acceptRunnable.source = this;
acceptRunnable.serverSocket = serverSocket;
acceptThread = new Thread(acceptRunnable);
acceptThread.start();
logger.debug("Source started");
super.start();
}
@Override
public void run() {
logger.debug("Starting connection handler");
Event event = null;
try {
Reader reader = Channels.newReader(socketChannel, "utf-8");
Writer writer = Channels.newWriter(socketChannel, "utf-8");
CharBuffer buffer = CharBuffer.allocate(maxLineLength);
buffer.flip(); // flip() so fill() sees buffer as initially empty
while (true) {
// this method blocks until new data is available in the socket
int charsRead = fill(buffer, reader);
logger.debug("Chars read = {}", charsRead);
// attempt to process all the events in the buffer
int eventsProcessed = processEvents(buffer, writer);
logger.debug("Events processed = {}", eventsProcessed);
if (charsRead == -1) {
// if we received EOF before last event processing attempt, then we
// have done everything we can
break;
} else if (charsRead == 0 && eventsProcessed == 0) {
if (buffer.remaining() == buffer.capacity()) {
// If we get here it means:
// 1. Last time we called fill(), no new chars were buffered
// 2. After that, we failed to process any events => no newlines
// 3. The unread data in the buffer == the size of the buffer
// Therefore, we are stuck because the client sent a line longer
// than the size of the buffer. Response: Drop the connection.
logger.warn("Client sent event exceeding the maximum length");
counterGroup.incrementAndGet("events.failed");
writer.write("FAILED: Event exceeds the maximum length (" +
buffer.capacity() + " chars, including newline)\n");
writer.flush();
break;
}
}
}
socketChannel.close();
counterGroup.incrementAndGet("sessions.completed");
} catch (IOException e) {
counterGroup.incrementAndGet("sessions.broken");
}
logger.debug("Connection handler exiting");
}
ScribeSource
Flume should adopt the Scribe entry {@code LogEntry} from existing Scribe system. Mostly, we may receive message from local Scribe and Flume take responsibility of central Scribe. Scribe是一个分布式的日志收集系统。facebook 广泛采用它,一般用scrbe收集数据,hdfs存储数据,mapreduce 处理数据。start 方法中启动了一个Startup 线程,在startup线程中启动了THsHaServer,THsHaServer在接收到消息时将调用Receiver的Log方法
private class Startup extends Thread {
public void run() {
try {
Scribe.Processor processor = new Scribe.Processor(new Receiver());
TNonblockingServerTransport transport = new TNonblockingServerSocket(port);
THsHaServer.Args args = new THsHaServer.Args(transport);
args.workerThreads(workers);
args.processor(processor);
args.transportFactory(new TFramedTransport.Factory());
args.protocolFactory(new TBinaryProtocol.Factory(false, false));
server = new THsHaServer(args);
LOG.info("Starting Scribe Source on port " + port);
server.serve();
} catch (Exception e) {
LOG.warn("Scribe failed", e);
}
}
}
@Override
public void start() {
Startup startupThread = new Startup();
startupThread.start();
try {
Thread.sleep(3000);
} catch (InterruptedException e) {}
if (!server.isServing()) {
throw new IllegalStateException("Failed initialization of ScribeSource");
}
sourceCounter.start();
super.start();
}
class Receiver implements Iface {
public ResultCode Log(List<LogEntry> list) throws TException {
if (list != null) {
sourceCounter.addToEventReceivedCount(list.size());
try {
List<Event> events = new ArrayList<Event>(list.size());
for (LogEntry entry : list) {
Map<String, String> headers = new HashMap<String, String>(1, 1);
headers.put(SCRIBE_CATEGORY, entry.getCategory());
Event event = EventBuilder.withBody(entry.getMessage().getBytes(), headers);
events.add(event);
}
if (events.size() > 0) {
getChannelProcessor().processEventBatch(events);
}
sourceCounter.addToEventAcceptedCount(list.size());
return ResultCode.OK;
} catch (Exception e) {
LOG.warn("Scribe source handling failure", e);
}
}
return ResultCode.TRY_LATER;
}
}
SequenceGeneratorSource
SequenceGeneratorSource是一个数字自动加1作为消息的source
SpoolDirectorySource
SpoolDirectorySource是用来监控文件夹下的文件的,它要求被监控的文件不能被修改,所以只能将文件MOVE到这个文件夹下。 在start方法中创建了一个reader对象,并生成了一个定时器,每500毫米调用一下
SpoolDirectoryRunnable的run函数,run函数中会调用read的readEvent函数
public void start() {
logger.info("SpoolDirectorySource source starting with directory: {}",
spoolDirectory);
ScheduledExecutorService executor =
Executors.newSingleThreadScheduledExecutor();
File directory = new File(spoolDirectory);
try {
reader = new ReliableSpoolingFileEventReader.Builder()
.spoolDirectory(directory)
.completedSuffix(completedSuffix)
.ignorePattern(ignorePattern)
.trackerDirPath(trackerDirPath)
.annotateFileName(fileHeader)
.fileNameHeader(fileHeaderKey)
.deserializerType(deserializerType)
.deserializerContext(deserializerContext)
.deletePolicy(deletePolicy)
.inputCharset(inputCharset)
.build();
} catch (IOException ioe) {
throw new FlumeException("Error instantiating spooling event parser",
ioe);
}
Runnable runner = new SpoolDirectoryRunnable(reader, sourceCounter);
executor.scheduleWithFixedDelay(
runner, 0, POLL_DELAY_MS, TimeUnit.MILLISECONDS);
super.start();
logger.debug("SpoolDirectorySource source started");
sourceCounter.start();
}
public List<Event> readEvents(int numEvents) throws IOException {
currentFile = getNextFile();
EventDeserializer des = currentFile.get().getDeserializer();
List<Event> events = des.readEvents(numEvents);
/* It's possible that the last read took us just up to a file boundary.
* If so, try to roll to the next file, if there is one. */
if (events.isEmpty()) {
retireCurrentFile();
currentFile = getNextFile();
if (!currentFile.isPresent()) {
return Collections.emptyList();
}
events = currentFile.get().getDeserializer().readEvents(numEvents);
}
if (annotateFileName) {
String filename = currentFile.get().getFile().getAbsolutePath();
for (Event event : events) {
event.getHeaders().put(fileNameHeader, filename);
}
}
committed = false;
lastFileRead = currentFile;
return events;
}
SyslogUDPSource 和SyslogTcpSource 主要用于处理Syslog的收集。在start方法中启动了一个server,并且在
messageReceived方法中调用syslogUtils.extractEvent方法去获得Event
@Override
public void messageReceived(ChannelHandlerContext ctx, MessageEvent mEvent) {
ChannelBuffer buff = (ChannelBuffer) mEvent.getMessage();
while (buff.readable()) {
Event e = syslogUtils.extractEvent(buff);
if (e == null) {
logger.debug("Parsed partial event, event will be generated when " +
"rest of the event is received.");
continue;
}
try {
getChannelProcessor().processEvent(e);
counterGroup.incrementAndGet("events.success");
} catch (ChannelException ex) {
counterGroup.incrementAndGet("events.dropped");
logger.error("Error writting to channel, event dropped", ex);
}
}
}
}
@Override
public void start() {
ChannelFactory factory = new NioServerSocketChannelFactory(
Executors.newCachedThreadPool(), Executors.newCachedThreadPool());
ServerBootstrap serverBootstrap = new ServerBootstrap(factory);
serverBootstrap.setPipelineFactory(new ChannelPipelineFactory() {
@Override
public ChannelPipeline getPipeline() {
syslogTcpHandler handler = new syslogTcpHandler();
handler.setEventSize(eventSize);
handler.setFormater(formaterProp);
return Channels.pipeline(handler);
}
});
logger.info("Syslog TCP Source starting...");
if (host == null) {
nettyChannel = serverBootstrap.bind(new InetSocketAddress(port));
} else {
nettyChannel = serverBootstrap.bind(new InetSocketAddress(host, port));
}
super.start();
}
ThriftLegacySource 和ThriftSource 主要用Thrift传输日志
文档:http://blog.csdn.net/amuseme_lu/article/details/6262572。Thrift是Facebook的一个开源项目,主要是一个跨语言的服务开发框架。server提供借口,client远程调用接口。