flume使用之httpSource

flume自带很长多的source,如:exe、kafka...其中有一个非常简单的source——httpsource,使用httpSource,flume启动后会拉起一个web服务来监听指定的ip和port。常用的使用场景:对于有些应用环境中,不能部署Flume SDK及其依赖项,可以在代码中通过HTTP而不是Flume的PRC发送数据的情况,此时HTTP SOURCE可以用来将数据接收到Flume中。

1、httpsource 参数:

 

配置参数 默认值 描述
type   http (org.apache.fluem.source.httpSource)
bind   绑定的IP地址或主机名
port   绑定的端口号
enableSSL false  
keystore   使用的keystore文件的路径
keystorePassword   能够进入keystore的密码
handler JSONHandler HTTP SOURCE使用的处理程序类
handler.*   传给处理程序类的任何参数 可以 通过使用此参数(*)配置传入

1)handler:

Flume使用一个可插拔的“handler”程序来实现转换,如果不指定默认是:JSONHandler,它能处理JSON格式的事件,格式如下。此外用户可以自定义handler,必须实现HTTPSourceHandler接口。

json数据格式:

 

[ { "headers":{"":"","":""
                 },
     "body":"the first event"
   },
   { "headers":{"":"","":""
                 },
     "body":"the second event"
   }
   
]


2、简单介绍一下flume的logger sink:

 

记录INFO级别的日志,一般用于调试。本文将使用这种类型的sink,配置的属性:

  • type  logger
  • maxBytesToLog    16    Maximum number of bytes of the Event body to log

注意:要求必须在 --conf 参数指定的目录下有 log4j的配置文件,可以通过-Dflume.root.logger=INFO,console在命令启动时手动指定log4j参数。

 

 

3、简单的httpSource实例:

1)下载flume、解压:

 

cd /usr/local/
wget http://mirror.bit.edu.cn/apache/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
tar -xvzf apache-flume-1.7.9-bin.tar.gz

 

配置flume的环境变量:

 

vim /etc/profile

export PS1="[\u@`/sbin/ifconfig eth0|grep 'inet '|awk -F'[: ]+' '{print $4}'` \W]"'$ '
export FLUME_HOME=/usr/local/apache-flume-1.6.0-bin
export PATH=$PATH:$FLUME_HOME/bin

 


2)安装jdk、配置环境变量;

3)配置flume:

 

cd /usr/local/flume/conf
vim flume-env.sh

指定java_home,同时放入如下log4j.properties

 

 

### set log levels ###
log4j.rootLogger = info,stdout ,  D ,  E

###
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern =  [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
 
### 输出到日志文件 ###
log4j.appender.D = org.apache.log4j.DailyRollingFileAppender
log4j.appender.D.File = /data/logs/flume/flume.log
log4j.appender.D.Append = true
log4j.appender.D.Threshold = info
log4j.appender.D.layout = org.apache.log4j.PatternLayout
log4j.appender.D.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
 
### 保存异常信息到单独文件 ###
log4j.appender.E = org.apache.log4j.DailyRollingFileAppender
log4j.appender.E.File =/data/logs/flume/flume_error.log
log4j.appender.E.Append = true
log4j.appender.E.Threshold = ERROR
log4j.appender.E.layout = org.apache.log4j.PatternLayout
log4j.appender.E.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n

### sink
log4j.logger.com.abc.ttbrain.log.flume.sink.MysqlSink= INFO, F, EE
log4j.additivity.com.abc.ttbrain.log.flume.sink.MysqlSink = false
log4j.appender.F= org.apache.log4j.DailyRollingFileAppender
log4j.appender.F.File=/data/logs/flume/flume_sink.log
log4j.appender.F.Append = true
log4j.appender.F.Threshold = info
log4j.appender.F.layout=org.apache.log4j.PatternLayout  
log4j.appender.F.layout.ConversionPattern= [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n

log4j.appender.EE= org.apache.log4j.DailyRollingFileAppender
log4j.appender.EE.File=/data/logs/flume/flume_sink_error.log
log4j.appender.EE.Append = true
log4j.appender.EE.Threshold = ERROR
log4j.appender.EE.layout=org.apache.log4j.PatternLayout  
log4j.appender.EE.layout.ConversionPattern= [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n

 

 

 

4)配置httpSource:

 

cd /usr/local/flume/conf
vim http_test.conf

a1.sources=r1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=http
a1.sources.r1.bind=localhost
a1.sources.r1.port=50000
a1.sources.r1.channels=c1

a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100


5)启动flume:

 

 

flume-ng agent -c /usr/local/flume/conf/ -f /usr/local/flume/conf/http_test.conf -n a1


6)测试:

 

开一个shell窗口,输入命令:

 

curl -X POST -d'[{"headers":{"h1":"v1","h2":"v2"},"body":"hello body"}]'  http://localhost:50000

 

在/data/log/flume/flume.log 文件中可以看到:

 

[09-29 10:31:12] [INFO] [org.apache.flume.sink.LoggerSink:94] Event: { headers:{h1=v1, h2=v2} body: 68 65 6C 6C 6F 20 62 6F 64 79                   hello body }


4、自定义handler:

 

假定xml请求格式,期望格式如下:

 


 
     value1
     test
 
 
    value1
    test2
  
 


1)pom.xml

 

 


  4.0.0
  org.pq
  flume-demo
  jar
  1.0
  flume-demo Maven jar
  http://maven.apache.org
  
    
      junit
      junit
      4.8.2
      test
    
    
      org.slf4j
      slf4j-log4j12
      1.7.7
      compile
    
    
      org.apache.flume
      flume-ng-core
      1.6.0
      compile
    
  
  
    flume-demo
  

2)自定义handler:

 

 

package org.pq.flumeDemo.sources;
import com.google.common.base.Preconditions;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.source.http.HTTPBadRequestException;
import org.apache.flume.source.http.HTTPSourceHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import javax.servlet.http.HttpServletRequest;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class HTTPSourceXMLHandler implements HTTPSourceHandler {
    private final String ROOT = "events";
    private final String EVENT_TAG = "event";
    private final String HEADERS_TAG = "headers";
    private final String BODY_TAG = "body";

    private final String CONF_INSERT_TIMESTAMP = "insertTimestamp";
    private final String TIMESTAMP_HEADER = "timestamp";
    private final DocumentBuilderFactory documentBuilderFactory
            = DocumentBuilderFactory.newInstance();

    // Document builders are not thread-safe.
    // So make sure we have one for each thread.
    private final ThreadLocal docBuilder
            = new ThreadLocal();

    private boolean insertTimestamp;
    private static final Logger LOG = LoggerFactory.getLogger(HTTPSourceXMLHandler.class);


    public List getEvents(HttpServletRequest httpServletRequest) throws HTTPBadRequestException, Exception {
        if (docBuilder.get() == null) {
            docBuilder.set(documentBuilderFactory.newDocumentBuilder());
        }
        Document doc;
        final List events;
        try {
            doc = docBuilder.get().parse(httpServletRequest.getInputStream());            
            Element root = doc.getDocumentElement();        

            root.normalize();
            // Verify that the root element is "events"
            Preconditions.checkState(
                    ROOT.equalsIgnoreCase(root.getTagName()));

            NodeList nodes = root.getElementsByTagName(EVENT_TAG);
            LOG.info("get nodes={}",nodes);

            int eventCount = nodes.getLength();
            events = new ArrayList(eventCount);
            for (int i = 0; i < eventCount; i++) {
                Element event = (Element) nodes.item(i);
                // Get all headers. If there are multiple header sections,
                // combine them.
                NodeList headerNodes
                        = event.getElementsByTagName(HEADERS_TAG);
                Map eventHeaders
                        = new HashMap();
                for (int j = 0; j < headerNodes.getLength(); j++) {
                    Node headerNode = headerNodes.item(j);
                    NodeList headers = headerNode.getChildNodes();
                    for (int k = 0; k < headers.getLength(); k++) {
                        Node header = headers.item(k);

                        // Read only element nodes
                        if (header.getNodeType() != Node.ELEMENT_NODE) {
                            continue;
                        }
                        // Make sure a header is inserted only once,
                        // else the event is malformed
                        Preconditions.checkState(
                                !eventHeaders.containsKey(header.getNodeName()),
                                "Header expected only once " + header.getNodeName());
                        eventHeaders.put(
                                header.getNodeName(), header.getTextContent());
                    }
                }
                Node body = event.getElementsByTagName(BODY_TAG).item(0);
                if (insertTimestamp) {
                    eventHeaders.put(TIMESTAMP_HEADER, String.valueOf(System
                            .currentTimeMillis()));
                }
                events.add(EventBuilder.withBody(
                        body.getTextContent().getBytes(
                                httpServletRequest.getCharacterEncoding()),
                        eventHeaders));
            }
        } catch (SAXException ex) {
            throw new HTTPBadRequestException(
                    "Request could not be parsed into valid XML", ex);
        } catch (Exception ex) {
            throw new HTTPBadRequestException(
                    "Request is not in expected format. " +
                            "Please refer documentation for expected format.", ex);
        }
        return events;
    }

    public void configure(Context context) {
        insertTimestamp = context.getBoolean(CONF_INSERT_TIMESTAMP,
                false);
    }
}

打包成dependency,然后放到flume的lib下。

 

3)flume配置文件:

 

a1.sources=r1
a1.sinks=k1
a1.channels=c1

a1.sources.r1.type=http
a1.sources.r1.bind=localhost
a1.sources.r1.port=50000
a1.sources.r1.channels=c1
a1.sources.r1.handler=org.pq.flumeDemo.sources.HTTPSourceXMLHandler
a1.sources.r1.insertTimestamp=true

a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

4)启动:

 

 

$ bin/flume-ng agent -c conf -f conf/http_test.conf  -n a1 -Dflume.root.logger=INFO,console

 

 

 

 

 

你可能感兴趣的:(flume,flume)