9. 安装Flume集群并采集数据

Flume简介


Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。

集群规划


hadoop151 hadoop152 hadoop153
Flume(采集数据)
Flume(消费数据)

安装Flume


  1. 解压到指定位置并重命名

    [hadoop@hadoop151 software]$ tar -zxvf apache-flume-1.7.0-bin.tar.gz -C /opt/module/
    [hadoop@hadoop151 module]$ mv apache-flume-1.7.0-bin/ flume
  2. 进入“flume/conf”目录,将“flume-env.sh.template”重命名后更改JAVA_HOME。

    [hadoop@hadoop151 conf]$ mv flume-env.sh.template flume-env.sh
    [hadoop@hadoop151 conf]$ vim flume-env.sh
    export JAVA_HOME=/opt/module/jdk
  3. 按照集群规划,在其他节点上进行上述操作。(也可使用脚本文件,笔记中有xsync集群分发脚本)

配置Flume采集数据


  1. 在“flume/conf”目录下创建file-flume-kafka.conf文件。

    a1.sources=r1
    a1.channels=c1 c2
    
    # configure source
    a1.sources.r1.type = TAILDIR
    a1.sources.r1.positionFile=/opt/module/flume/test/log_position.json
    a1.sources.r1.filegroups = f1
    a1.sources.r1.filegroups.f1 = /tmp/logs/app.+
    a1.sources.r1.fileHeader = true
    a1.sources.r1.channels = c1 c2
    
    #interceptor
    a1.sources.r1.interceptors =  i1 i2
    a1.sources.r1.interceptors.i1.type = com.bbxy.flume.interceptor.LogETLInterceptor$Builder
    a1.sources.r1.interceptors.i2.type = com.bbxy.flume.interceptor.LogTypeInterceptor$Builder
    
    a1.sources.r1.selector.type = multiplexing
    a1.sources.r1.selector.header = topic
    a1.sources.r1.selector.mapping.topic_start = c1
    a1.sources.r1.selector.mapping.topic_event = c2
    
    # configure channel
    a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
    a1.channels.c1.kafka.bootstrap.servers = hadoop151:9092,hadoop152:9092,hadoop153:9092
    a1.channels.c1.kafka.topic = topic_start
    a1.channels.c1.parseAsFlumeEvent = false
    a1.channels.c1.kafka.consumer.group.id = flume-consumer
    
    a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel
    a1.channels.c2.kafka.bootstrap.servers = hadoop151:9092,hadoop152:9092,hadoop153:9092
    a1.channels.c2.kafka.topic = topic_event
    a1.channels.c2.parseAsFlumeEvent = false
    a1.channels.c2.kafka.consumer.group.id = flume-consumer

    将该文件分发至hadoop152上。

  2. 自定义ETL拦截器和分类型拦截器

    • ETL拦截器主要用于过滤时间戳不合法和Json数据不完整的日志;
    • 日志类型区分拦截器主要用于将启动日志和事件日志区分开来,方便发往Kafka的不同Topic。
    1. 创建maven工程flume-interceptor,在pom.xml文件中写入依赖。

      
        
            org.apache.flume
            flume-ng-core
            1.7.0
        
      
      
      
        
            
                maven-compiler-plugin
                2.3.2
                
                    1.8
                    1.8
                
            
            
                maven-assembly-plugin
                
                    
                        jar-with-dependencies
                    
                
                
                    
                        make-assembly
                        package
                        
                            single
                        
                    
                
            
        
      
    2. 创建LogETLInterceptor类(ETL拦截器)。

      package com.bbxy.flume.interceptor;
      
      import org.apache.flume.Context;
      import org.apache.flume.Event;
      import org.apache.flume.interceptor.Interceptor;
      
      import java.nio.charset.Charset;
      import java.util.ArrayList;
      import java.util.List;
      
      public class LogETLInterceptor implements Interceptor {
      
        @Override
        public void initialize() {
      
        }
      
        @Override
        public Event intercept(Event event) {
      
            // 1 获取数据
            byte[] body = event.getBody();
            String log = new String(body, Charset.forName("UTF-8"));
      
            // 2 判断数据类型并向Header中赋值
            if (log.contains("start")) {
                if (LogUtils.validateStart(log)){
                    return event;
                }
            }else {
                if (LogUtils.validateEvent(log)){
                    return event;
                }
            }
      
            // 3 返回校验结果
            return null;
        }
      
        @Override
        public List intercept(List events) {
      
            ArrayList interceptors = new ArrayList<>();
      
            for (Event event : events) {
                Event intercept1 = intercept(event);
      
                if (intercept1 != null){
                    interceptors.add(intercept1);
                }
            }
      
            return interceptors;
        }
      
        @Override
        public void close() {
      
        }
      
        public static class Builder implements Interceptor.Builder{
      
            @Override
            public Interceptor build() {
                return new LogETLInterceptor();
            }
      
            @Override
            public void configure(Context context) {
      
            }
        }
      }
    3. 创建LogTypeInterceptor类。(日志类型区分拦截器)

      package com.bbxy.flume.interceptor;
      
      import org.apache.flume.Context;
      import org.apache.flume.Event;
      import org.apache.flume.interceptor.Interceptor;
      
      import java.nio.charset.Charset;
      import java.util.ArrayList;
      import java.util.List;
      import java.util.Map;
      
      public class LogTypeInterceptor implements Interceptor {
        @Override
        public void initialize() {
      
        }
      
        @Override
        public Event intercept(Event event) {
      
            // 区分日志类型:   body  header
            // 1 获取body数据
            byte[] body = event.getBody();
            String log = new String(body, Charset.forName("UTF-8"));
      
            // 2 获取header
            Map headers = event.getHeaders();
      
            // 3 判断数据类型并向Header中赋值
            if (log.contains("start")) {
                headers.put("topic","topic_start");
            }else {
                headers.put("topic","topic_event");
            }
      
            return event;
        }
      
        @Override
        public List intercept(List events) {
      
            ArrayList interceptors = new ArrayList<>();
      
            for (Event event : events) {
                Event intercept1 = intercept(event);
      
                interceptors.add(intercept1);
            }
      
            return interceptors;
        }
      
        @Override
        public void close() {
      
        }
      
        public static class Builder implements  Interceptor.Builder{
      
            @Override
            public Interceptor build() {
                return new LogTypeInterceptor();
            }
      
            @Override
            public void configure(Context context) {
      
            }
        }
      }
    4. 日志过滤工具类

      package com.bbxu.flume.interceptor;
      import org.apache.commons.lang.math.NumberUtils;
      
      public class LogUtils {
      
        public static boolean validateEvent(String log) {
            // 服务器时间 | json
            // 1549696569054 | {"cm":{"ln":"-89.2","sv":"V2.0.4","os":"8.2.0","g":"[email protected]","nw":"4G","l":"en","vc":"18","hw":"1080*1920","ar":"MX","uid":"u8678","t":"1549679122062","la":"-27.4","md":"sumsung-12","vn":"1.1.3","ba":"Sumsung","sr":"Y"},"ap":"weather","et":[]}
      
            // 1 切割
            String[] logContents = log.split("\\|");
      
            // 2 校验
            if(logContents.length != 2){
                return false;
            }
      
            //3 校验服务器时间
            if (logContents[0].length()!=13 || !NumberUtils.isDigits(logContents[0])){
                return false;
            }
      
            // 4 校验json
            if (!logContents[1].trim().startsWith("{") || !logContents[1].trim().endsWith("}")){
                return false;
            }
      
            return true;
        }
      
        public static boolean validateStart(String log) {
      
            if (log == null){
                return false;
            }
      
            // 校验json
            if (!log.trim().startsWith("{") || !log.trim().endsWith("}")){
                return false;
            }
      
            return true;
        }
      }
  3. 打包。选取不带依赖的jar包放入“flume/lib”目录下。

  4. 将打包的文件发送到hadoop152上。

  5. 启动flume消费埋点数据。

    [hadoop@hadoop151 flume]$ bin/flume-ng agent --name a1 --conf-file conf/file-flume-kafka.conf &
    [hadoop@hadoop152 flume]$ bin/flume-ng agent --name a1 --conf-file conf/file-flume-kafka.conf &

    或使用脚本文件消费。

你可能感兴趣的:(数据仓库,flume)