先吐槽一下阿里云,简直了,为了一个简单demo,简直无语
先是本身MQ的kafka有问题,然后3.30升级也无这方面文档提供,回到正题:
本文主要讲讲述下再阿里云的emr中的spark streaming怎么连接阿里云的消息kafka
在新的消息队列kafka中,申请topic+consumer groupID
(1) topic建议测试使用外网
(2)阿里云需建立2个groupID
一个为executor使用:spark-executor-CID-xxxx
一个为driver使用:CID-xxxx
代码consumer使用: CID-xxxx
2. 配置根证书和jaas.config,例配置在服务器 /kafka/my目录
kafka_client_jaas.config 格式如下
KafkaClient {
com.aliyun.openservices.ons.sasl.client.OnsLoginModule required
AccessKey="xxxx"
SecretKey="xxxx";
};
3. 构造kafkaparams参数
(1) bootstrap.servers 在原有接入点加前缀: SASL_SSL://
(2) group.id 为driver的consumer id,不添加spark-executor-
4. 运行
4.1 local模式,直接运行
4.2 集群模式
(1) 每个机器在需在相同目录保存jks和con文件
阿里云emr集群之间复制文件,注意切换到 hadoop 账户
注意权限修改777 方便
(2) 运行提交参数,设置conf或者在spark-default.conf中配置
-conf spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/kafka/my/kafka_client_jaas.conf
--conf spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/kafka/my/kafka_client_jaas.conf
其中,echo $SPARK_CONF_DIR可以打印spark conf路径
spark submit完整参数例子如下
--class com.sd.App --master yarn --deploy-mode client --driver-memory 2g --num-executors 2 --executor-memory 1g --executor-cores 2 --conf spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/kafka/my/kafka_client_jaas.conf --conf spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/kafka/my/kafka_client_jaas.conf ossref://gm-big-data/onlytest/com.sd.re_test-1.0-shaded.jar
5. 核心代码片段如下
private def construcKafkaParams: Map[String, Object] = {
val jks = "/kafka/my/kafka.client.truststore.jks"
val jaas_conf = "/kafka/my/kafka_client_jaas.conf"
val conf = System.getProperty("java.security.auth.login.config")
CommonFun.devinPrintln("jass_conf init", conf)
// if(null == conf) {
// System.setProperty("java.security.auth.login.config", jaas_conf)
// }
val kServer = "SASL_SSL://kafka-cn-internet.aliyun.com:8080"
val groupID = "CID-real-log-test"
Map[String,Object](
"bootstrap.servers" -> kServer,
"ssl.truststore.location" -> jks,
"ssl.truststore.password" -> "KafkaOnsClient",
"security.protocol" -> "SASL_SSL",
"sasl.mechanism" -> "ONS",
"auto.commit.interval.ms" -> "1000",
"session.timeout.ms" -> "30000",
"enable.auto.commit" -> (false: java.lang.Boolean),
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> groupID,
// "client.id" -> groupID,
"auto.offset.reset" -> "latest"
)
}
def testKafka(): Unit = {
val sparkConf = new SparkConf().setAppName("testkafka").setMaster("local")
val ssc = new StreamingContext(sparkConf, Seconds(3))
val topics = List("alikafka-real-test").toSet
val kafkaParams = construcKafkaParams
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
val trans = stream.transform{ rdd =>
val logCount = rdd.count()
CommonFun.devinPrintln(s" have got log count ${logCount}")
rdd.map{ x => x.toString
}
}
trans.foreachRDD{
rdd =>
rdd.foreach(println)
}
ssc.start()
ssc.awaitTermination()
ssc.stop()
}
6. pom文件参考如下
org.apache.spark
spark-streaming_${scala.compat.version}
${spark.version}
org.apache.spark
spark-sql_${scala.compat.version}
${spark.version}
org.apache.spark
spark-streaming-kafka-0-10_${scala.compat.version}
${spark.version}
7. 完整demo参考本人 资源demo
https://download.csdn.net/download/shuaidan19920412/10326950