Flume采集nginx日志到HDFS

1.环境准备

jdk+hadoop:搭建伪分布式

nginx安装参考:centos7安装nginx

flume安装很简单,下载解压即可

2.Nginx

  • 修改nginx配置 nginx.conf
 [root@hadoop000 conf]# vi nginx.conf
 ......
 # 将该部分注释放开
 log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                   '$status $body_bytes_sent "$http_referer" '
                   '"$http_user_agent" "$http_x_forwarded_for"';
 ......
 server {
        listen       80;
        server_name  localhost;

        #charset koi8-r;

        # 开启 (此处修改了日志文件名称)
        access_log  logs/access.log  main;
        # access_log  logs/host.access.log  main;
......
  • 重启nginx
# 重新启动nginx
[root@hadoop000 sbin]# ./nginx -s reload

/usr/local/webserver/nginx/logs/access.log

3.Flume

  • 编写配置
# 新建flume配置目录
[hadoop@hadoop000 conf]$ mkdir nginx
[hadoop@hadoop000 conf]$ cd nginx
[hadoop@hadoop000 nginx]$ pwd
/home/hadoop/app/flume/conf/nginx

# 编写 flume 配置
[hadoop@hadoop000 nginx]$ vi nginxlog-hdfs.conf
# 配置Agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置Source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.deserializer.outputCharset = UTF-8

# 配置需要监控的日志输出目录
a1.sources.r1.command = tail -F /usr/local/webserver/nginx/logs/access.log

# 配置Sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
# 按天分区
a1.sinks.k1.hdfs.path = hdfs://hadoop000:9000/nginx/events/%Y-%m-%d
# 文件名
a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H
# 文件后缀
a1.sinks.k1.hdfs.fileSuffix = .log
# 指定每个HDFS块的最小复制数。如果没有指定,它来自类路径中的默认Hadoop配置
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.fileType = DataStream
# 序列文件格式(Text/Writable(默认值)),在使用Flume创建数据文件之前设置为Text,否则这些文件不能被Impala或Hive读取
a1.sinks.k1.hdfs.writeFormat = Text

# 配置Channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 将三者连接
a1.sources.r1.channel = c1
a1.sinks.k1.channel = c1
  • 启动flume
[hadoop@hadoop000 bin]$ flume-ng agent \
> --name a1 \
> --conf-file /home/hadoop/app/flume/conf/nginx/nginxlog-hdfs.conf \
> --conf $FLUME_HOME/conf \
> -Dflume.root.logger=INFO,console

4.验证

虚拟机的时间设置不准确,其实本次验证是2019-09-11 11点左右操作的,但是对结果验证不影响

  • 浏览器访问nginx

  • 查看hdfs
    Flume采集nginx日志到HDFS_第1张图片

日志文件已经被收集到了hdfs

# 命令行查看
[hadoop@hadoop000 ~]$ hdfs dfs -text /nginx/events/2019-09-08/20*
172.16.75.1 - - [08/Sep/2019:05:35:31 +0800] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:05:35:31 +0800] "GET /favicon.ico HTTP/1.1" 404 570 "http://172.16.75.130/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:05:35:38 +0800] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:05:35:38 +0800] "GET /favicon.ico HTTP/1.1" 404 570 "http://hadoop000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:06:24:43 +0800] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"

# 此时再次点击web页面访问nginx
# 重新命令行查看  可以看到最近的访问记录也已经被收集到了hdfs
[hadoop@hadoop000 ~]$ hdfs dfs -text /nginx/events/2019-09-08/20*
172.16.75.1 - - [08/Sep/2019:05:35:31 +0800] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:05:35:31 +0800] "GET /favicon.ico HTTP/1.1" 404 570 "http://172.16.75.130/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:05:35:38 +0800] "GET / HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:05:35:38 +0800] "GET /favicon.ico HTTP/1.1" 404 570 "http://hadoop000/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:06:24:43 +0800] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
172.16.75.1 - - [08/Sep/2019:08:18:49 +0800] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36" "-" 0.000
172.16.75.1 - - [08/Sep/2019:08:18:49 +0800] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36" "-" 0.000

你可能感兴趣的:(flume)