nginx-kafka 数据采集

传统大数据采集一般通过flume采集nginx的log来实现,然后再经过kafka传递数据
有了ngx_kafak_module 数据采集就能通过nginx直接向kafka发送数据(用户行为日志)

多逛逛全球最大的同性交友网站还是能学到很多东西滴~

nginx-kafka安装脚本

注意CentOS/Ubuntu安装依赖库时的区别

install-nginx-kafka.sh

#!/bin/bash

# centos
#yum update; yum install -y gcc gcc-c++ pcre-devel zlib-devel make git wget curl vim
#ubuntu
apt-get update; apt-get install -y gcc g++ libpcre3 libpcre3-dev zlib1g-dev libssl-dev make git wget curl vim

cd /tmp
git clone https://github.com/edenhill/librdkafka
git clone https://github.com/brg-liuwei/ngx_kafka_module
wget http://nginx.org/download/nginx-1.15.5.tar.gz

cd /tmp/librdkafka
./configure; make; sudo make install

tar -zxvf nginx-1.15.5.tar.gz

cd /tmp/nginx-1.15.5
./configure --prefix=/usr/local/nginx_kafka --add-module=/tmp/ngx_kafka_module; make; sudo make install
sudo ln -s /usr/local/nginx_kafka/sbin/nginx /usr/local/bin/nginx-kafka

sudo echo "/usr/local/lib" >> /etc/ld.so.conf
sudo ldconfig
  1. 更新软件源 & 安装依赖库、软件
  2. 下载librdkafka、ngx_kafka_module、nginx源码
  3. 编译安装librdkafka
  4. 解压nginx源码 & 带上ngx_kafka_module编译安装
  5. 为了方便,制作nginx-kafka软链(不与其他nginx冲突)
  6. 如果启动nginx报错,找不到kafka.so.1的文件
    error while loading shared libraries: librdkafka.so.1: cannot open shared object file: No such file or directory
  7. 加载so库
    echo "/usr/local/lib" >> /etc/ld.so.conf; ldconfig

nginx-kafka.conf

#user  nobody;
worker_processes  1;
#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;
#pid        logs/nginx.pid;
events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;
    #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    #                  '$status $body_bytes_sent "$http_referer" '
    #                  '"$http_user_agent" "$http_x_forwarded_for"';
    #access_log  logs/access.log  main;
    sendfile        on;
    #tcp_nopush     on;
    #keepalive_timeout  0;
    keepalive_timeout  65;
    #gzip  on;
    
    kafka;
    kafka_broker_list kafka-1:9092 kafka-2:9092 kafka-3:9092;  
    
    server {
        listen       80;
        server_name  localhost;
        #charset koi8-r;
        #access_log  logs/host.access.log  main;
        location = /kafka/log {
                kafka_topic log;
        }
        location = /kafka/user {
                kafka_topic user;
        }
        #error_page  404              /404.html;
        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }
}
  1. 指定kafka集群kafka_broker_list ip | host:port;
  2. location 可以根据topic划分URL

启动nginx

  • 启动zookeeper集群和kafka集群(创建topic)
    略。。。

  • 测试配置文件
    nginx-kafka -c nginx-kafka.conf -t

  • 启动nginx-kafka
    nginx-kafka -c nginx-kafka.conf -s reload

  • enjoy

你可能感兴趣的:(nginx-kafka 数据采集)