原文地址:https://cloud.tencent.com/developer/article/1078494
在Kafka集群实际应用中,Kafka的消费者有很多种(如:应用程序、Flume、Spark Streaming、Storm等),本篇文章主要讲述如何在Kerberos环境使用Flume采集Kafka数据并写入HDFS。本文的数据流图如下:
登录Cloudera Manager进入Kafka服务,修改如下配置Kerberos.auth.enable和security.inter.broker.protocol配置为如下截图:
保存配置并重启Kafka服务。
由于Kafka集群已启用Kerberos认证,这里需要准备访问Kafka集群的环境,如Keytab、jaas.conf配置等
[root@ip-172-31-22-86 kafkatest]# pwd
/home/ec2-user/kafkatest
[root@ip-172-31-22-86 kafkatest]# kadmin.local
Authenticating as principal hdfs/[email protected] with password.
kadmin.local: xst -norandkey -k fayson.keytab [email protected]
可以看到在当前目录下生成了[email protected]账号的keytab文件。
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/keytab/fayson.keytab"
principal="[email protected]";
};
这里我们将上面的配置文件拷贝放在Flume Agent节点的/flume-keytab目录下
[ec2-user@ip-172-31-21-45 flume-keytab]$ sudo chown -R flume. /flume-keytab/
[ec2-user@ip-172-31-21-45 flume-keytab]$ sudo chmod -R 755 /flume-keytab/
kafka.channels = c1
kafka.sources = s1
kafka.sinks = k1
kafka.sources.s1.type =org.apache.flume.source.kafka.KafkaSource
kafka.sources.s1.kafka.bootstrap.servers =ip-172-31-26-80.ap-southeast-1.compute.internal:9092,ip-172-31-21-45.ap-southeast-1.compute.internal:9092, ip-172-31-26-102.ap-southeast-1.compute.internal:9092
kafka.sources.s1.kafka.topics = test4
kafka.sources.s1.kafka.consumer.group.id =flume-consumer
kafka.sources.s1.kafka.consumer.security.protocol= SASL_PLAINTEXT
kafka.sources.s1.kafka.consumer.sasl.mechanism= GSSAPI
kafka.sources.s1.kafka.consumer.sasl.kerberos.service.name= kafka
kafka.sources.s1.channels = c1
kafka.channels.c1.type = memory
kafka.sinks.k1.type = hdfs
kafka.sinks.k1.channel = c1
kafka.sinks.k1.hdfs.kerberosKeytab= /flume-keytab/fayson.keytab
kafka.sinks.k1.hdfs.kerberosPrincipal= [email protected]
kafka.sinks.k1.hdfs.path =/tmp/kafka-test
kafka.sinks.k1.hdfs.filePrefix = events-
kafka.sinks.k1.hdfs.writeFormat = Text
关于HDFS Sink的更多配置可以参考:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
-Djava.security.auth.login.config=/flume-keytab/jaas.conf
配置完成后保存更改并重启FlumeAgent服务。
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/home/ec2-user/run-kafka/conf/fayson.keytab"
principal="[email protected]";
};
package com.cloudera;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.io.File;
import java.util.Properties;
/**
* package: com.cloudera
* describe: TODO
* creat_user: Fayson
* email: [email protected]
* creat_date: 2017/12/12
* creat_time: 下午3:35
* 公众号:Hadoop实操
*/
public class ProducerTest {
public static String TOPIC_NAME = "test4";
public static String confPath = System.getProperty("user.dir") + File.separator + "conf";
public static void main(String[] args) {
try {
String krb5conf = confPath + File.separator + "krb5.conf";
String jaasconf = confPath + File.separator + "jaas.conf";
System.setProperty("java.security.krb5.conf", krb5conf);
System.setProperty("java.security.auth.login.config", jaasconf);
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
// System.setProperty("sun.security.krb5.debug", "true"); //Kerberos Debug模式
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "ip-172-31-21-45.ap-southeast-1.compute.internal:9092,ip-172-31-26-102.ap-southeast-1.compute.internal:9020,ip-172-31-26-80.ap-southeast-1.compute.internal:9020");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put("security.protocol", "SASL_PLAINTEXT");
props.put("sasl.kerberos.service.name", "kafka");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
for (int i = 0; i < 10; i++) {
String message = i + "\t" + "fayson" + i + "\t" + 22+i;
ProducerRecord record = new ProducerRecord<String, String>(TOPIC_NAME, message);
producer.send(record);
System.out.println(message);
}
producer.flush();
producer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
mvn dependency:copy-dependencies -DoutputDirectory=/Users/fayson/Desktop/lib
将导出的jar包放在run-kafka/lib目录下。
#!/bin/bash
JAVA_HOME=/usr/java/jdk1.8.0_131-cloudera
for file in `ls lib/*jar`
do
CLASSPATH=$CLASSPATH:$file
done
export CLASSPATH
${JAVA_HOME}/bin/java com.cloudera.ProducerTest
jaas.conf:java访问Kerberos环境下的配置
krb5.conf:集群的krb5配置文件
[ec2-user@ip-172-31-22-86 run-kafka]$ sh run.sh
这里可以看到数据已写入HDFS指定的目录。