目录
目录
一、序言
二、环境准备
2.1 docker环境准备
2.2 安装zookeeper、kafka、kafka-manager环境
2.2.1 zookeeper
2.2.2 kafka
2.2.3 kafka-manager
2.3 安装flume
2.4 安装flink
三、程序开发
3.1.程序生成日志到flume
3.2程序获取kafka中的数据
flink接收
验证数据
实验用到的组件有:docker、kafka、kafka-manager、zookeeper、flume;由于资源限制使用docker下安装kafka和zookeeper,在试验机上直接安装flume和kafka-manager。
实验内容:1.本地产生日志数据,通过log4j将日志收集到flume中,flume将数据sink到kafka中;2.flume从kafka中获取数据然后打印到控制台中;(或者使用flink从kafka中拿到数据,添加标识字段后重新放入kafka另一个topic中,注稍后补全这部分)
实验目的:通过实验学习到docker安装、使用、kafka操作、flume操作以及部署工作;
所需maven依赖包
org.apache.kafka
kafka_2.11
0.9.0.1
org.apache.flume.flume-ng-clients
flume-ng-log4jappender
1.7.0
org.apache.flume
flume-ng-core
1.7.0
org.apache.flume
flume-ng-configuration
1.7.0
org.apache.flink
flink-java
1.5.0
org.apache.flink
flink-connector-kafka-0.9_2.11
1.5.0
org.apache.flink
flink-streaming-java_2.11
1.5.0
log4j
log4j
1.2.17
org.slf4j
slf4j-api
1.7.5
org.slf4j
slf4j-log4j12
1.7.5
com.alibaba
fastjson
1.2.47
实验机环境为centos7
参照文章:https://www.cnblogs.com/yufeng218/p/8370670.html
使用docker search zookeeper命令获取资源库zookeeper列表
下载首先pull获取 wurstmeister的zookeeper
启动zookeeper:docker run -d --name zookeeper -p 2181:2181 -t wurstmeister/zookeeper
使用docker search zookeeper命令获取资源库kafka列表
下载首先pull获取 wurstmeister的kafka
启动kafka:
docker run -d --name kafka -p 9092:9092 -e KAFKA_BROKER_ID=0 -e KAFKA_ZOOKEEPER_CONNECT=192.168.83.112:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.83.112:9092 -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 -t wurstmeister/kafka
解释:
KAFKA_BROKER_ID=0 //broker id,如果想要启动多个就执行多次命令保证 id不相同就行了
KAFKA_ZOOKEEPER_CONNECT=192.168.83.112:2181 //外界连接kafka所需
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://192.168.83.112:9092 //外界连接kafka所需,地址是宿主机地址
KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092
安装在宿主机上,没有安装到docker中,因为docker中的镜像存在组件缺失
从 GitHub 下载后编译,编译方法自行百度,或者直接下载已经编译好的
链接:https://pan.baidu.com/s/1zmhG6-eP_0RsGDxvcEMzyw
密码:sc8w
解压到最终安装位置,然后配置两项
kafka-manager.zkhosts="192.168.83.112:2181
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "INFO"
logger-startup-timeout = 30s
}
然后执行启动命令
nohup bin/kafka-manager -Dconfig.file=conf/application.conf -Dhttp.port=9000 &
从官网下载文件,解压后上传到最终安装目录,
配置java路径到flume-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_171-amd64
从官网下载文件,解压上传到最终目录
因为本次试验使用的是单机版的,因此直接在bin目录下运行.start-cluster.sh即可,web端口为8081
至此,我们的环境已经全部准备好了
log4j配置:
log4j.rootLogger=INFO,flume,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern="%d{yyyy-MM-dd HH:mm:ss} %p [%c:%L] - %m%n
log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = 192.168.83.112
log4j.appender.flume.Port = 44444
log4j.appender.flume.UnsafeMode = true
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %p [%c:%L] - %m%n
循环生成日志
import java.util.Date;
import org.apache.log4j.Logger;
public class WriteLog {
private static Logger logger = Logger.getLogger(WriteLog.class);
public static void main(String[] args) throws InterruptedException {
// 记录debug级别的信息
logger.debug("This is debug message.");
// 记录info级别的信息
logger.info("This is info message.");
// 记录error级别的信息
logger.error("This is error message.");
int i = 0;
while (true) {
logger.info(new Date().getTime());
logger.info("测试数据" + i);
Thread.sleep(2000);
i += 1;
}
}
}
在flume中的conf,复制模板为example.conf
书写配置:
# Name the components on this agent
# 定义一个 agent 的元素
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
# 配置 source
#使用avro接收log4j过来的数据
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444
# Describe the sink
# 配置 sink
#a1.sinks.k1.type = logger
#将数据写入kafka,设置topic和brokers地址
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = test
a1.sinks.k1.brokerList = 192.168.83.112:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 100
# Use a channel which buffers events in memory
# 定义 channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
# 用 channel 连接起来 source 和 sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
执行flume命令
flume-ng agent -c /opt/soft/apache-flume-1.8.0-bin/conf -f example.conf --name a1 -Dflume.root.logger=INFO,console
启动java程序
java程序:
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
/**
* Created by anan on 2018-7-31 14:20.
*/
public class CustomerSource extends AbstractSink implements Configurable {
@Override
public Status process() throws EventDeliveryException {
Status status = null;
// Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to
// do
Event event = ch.take();
// Send the Event to the external repository.
// storeSomeData(e);
String eventBody = new String(event.getBody(), "utf-8");
System.out.println("============= " + eventBody + " ========");
txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback();
// Log exception, handle individual exceptions as needed
status = Status.BACKOFF;
// re-throw all Errors
if (t instanceof Error) {
throw (Error) t;
}
}
// you must add this line of code in order to close the Transaction.
txn.close();
return status;
}
@Override
public void configure(Context context) {
}
@Override
public synchronized void start() {
super.start();
}
@Override
public synchronized void stop() {
super.stop();
}
}
新建flume conf test.conf
#soource的名字
agent.sources = kafkaSource
agent.channels = memoryChannel
agent.sinks = hdfsSink
agent.sources.kafkaSource.channels = memoryChannel
agent.sinks.hdfsSink.channel = memoryChannel
#-------- kafkaSource相关配置-----------------
agent.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent.sources.kafkaSource.zookeeperConnect =192.168.83.112:2181
# 配置消费的kafka topic
agent.sources.kafkaSource.topic = test
# 配置消费者组的id
agent.sources.kafkaSource.groupId = flume
# 消费超时时间,参照如下写法可以配置其他所有kafka的consumer选项。注意格式从kafka.xxx开始是consumer的配置属性
agent.sources.kafkaSource.kafka.consumer.timeout.ms = 100
#------- memoryChannel相关配置-------------------------
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity=10000
agent.channels.memoryChannel.transactionCapacity=1000
#---------hdfsSink 相关配置------------------
agent.sinks.hdfsSink.type = com.gd.bigdataleran.flume.customerSource.CustomerSource
执行flume命令
flume-ng agent -c /opt/soft/apache-flume-1.8.0-bin/conf -f test.conf --name agent -Dflume.root.logger=INFO,console
然后查看控制台是否打印生成的日志。
如果想要flink接收kafka数据然后将数据经过简单处理后放到kafka,就需要使用到flinkkafkaconsumer和flinkkafkaproductor
,java代码如下
package com.gd.bigdataleran.flink;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSink;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.streaming.api.functions.sink.SinkFunction;
import org.apache.flink.streaming.api.functions.source.RichSourceFunction;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer09;
import org.apache.flink.streaming.connectors.kafka.internals.KafkaTopicPartition;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
/**
* Created by anan on 2018-8-3 15:44.
*/
public class kafkaconsumer {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(5000); // 非常关键,一定要设置启动检查点!!
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "192.168.83.112:9092");
// properties.setProperty("zookeeper.connect", "192.168.83.112:2181");
properties.setProperty("group.id", "test112");
FlinkKafkaConsumer09 myConsumer = new FlinkKafkaConsumer09<>("test", new SimpleStringSchema(), properties);
myConsumer.setStartFromEarliest();
System.out.println("执行输入");
DataStream stream = env.addSource(myConsumer);
DataStream ds = stream.map(new MapFunction() {
@Override
public Object map(String s) throws Exception {
return s + "==" + new Date().getTime();
}
});
FlinkKafkaProducer09 flinkKafkaProducer09 = new FlinkKafkaProducer09("192.168.83.112:9092","test1",new SimpleStringSchema());
ds.addSink(flinkKafkaProducer09);
System.out.println("执行输出");
env.execute();
}
}
将需要的jar包上传到flink目录下的lib目录下,然后执行flink命令即可
flume-ng agent -c /opt/soft/apache-flume-1.8.0-bin/conf -f example.conf --name a1 -Dflume.root.logger=INFO,console
使用java连接kafka,获取topic中的数据进行验证;验证代码 如下
private void getKafkaData() {
String topic = "test1";
Properties kafkaProps = new Properties();
kafkaProps.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaProps.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaProps.put("bootstrap.servers", "192.168.83.112:9092");
kafkaProps.put("zookeeper.connect", "192.168.83.112:2181");
kafkaProps.put("group.id", "farmtest1");
kafkaProps.put("auto.offset.reset", "smallest");
kafkaProps.put("enable.auto.commit", "true");
ConsumerConnector consumer = Consumer.createJavaConsumerConnector(new ConsumerConfig(kafkaProps));
Map topicCountMap = new HashMap();
topicCountMap.put(topic, 1); // 一次从主题中获取一个数据
Map>> messageStreams = consumer.createMessageStreams(topicCountMap);
KafkaStream stream = messageStreams.get(topic).get(0);// 获取每次接收到的这个数据
ConsumerIterator iterator = stream.iterator();
while (iterator.hasNext()) {
String message = new String(iterator.next().message());
consumer.commitOffsets();
System.out.println(message);
}
}
时间差匆忙,有问题欢迎大家提问讨论