环境版本: Scala 2.11.8; Kafka 0.10.0.1; Spark 2.0.0
如需Scala 2.10.5; Spark 1.6.0; Kafka 0.10.0.1版本请看这篇:Flume+Kakfa+Spark Streaming整合
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* 打印从kafka中消费的信息
* Created by drguo on 2019/11/8 11:16.
*/
object Test {
private val brokers = "hqc-test-hdp1:6667,hqc-test-hdp2:6667,hqc-test-hdp3:6667"
def main(args: Array[String]): Unit = {
// 两种方式都可
// val sparkSession = getSparkSession()
// val sc = sparkSession.sparkContext
val sc = new SparkConf()
// 本地模式,* 自动检测cpu核心,占满
.setMaster("local[*]")
.setAppName("DirectKafkaTest")
val ssc = new StreamingContext(sc, Seconds(6))
val topics = Array("test")
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> brokers,
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "hqc",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val messages = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
val lines = messages.map(_.value)
lines.print()
ssc.start()
ssc.awaitTermination()
}
def getSparkSession(): SparkSession = {
val sparkSession = SparkSession
.builder()
.appName("test")
.master("local[*]")
.getOrCreate()
sparkSession
}
}
在本地运行时卡在
19/11/08 11:31:30 INFO AbstractCoordinator: Discovered coordinator hqc-test-hdp2:6667 (id: 2147482646 rack: null) for group spark-executor-hqc.
然后maven install打包提交到集群运行,没有问题
结果本地测试了一下,莫名其妙也没问题了,可能是因为依赖的jar包没下全,pom.xml加入下面插件,install时下载打包了所有依赖
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.3</version>
<configuration>
<classifier>dist</classifier>
<appendAssemblyId>true</appendAssemblyId>
<descriptorRefs>
<descriptor>jar-with-dependencies</descriptor>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
提交程序
[hdfs@hqc-test-hdp3 ~]$ ls
mysql_kafka_hive.conf spark-pca-1.0-SNAPSHOT.jar spark-pca-1.0-SNAPSHOT-jar-with-dependencies.jar spark-pca.jar
[hdfs@hqc-test-hdp3 ~]$ spark-submit --class OnlineDataPCA --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 500m --executor-cores 1 --queue default spark-pca-1.0-SNAPSHOT-jar-with-dependencies.jar
[hdfs@hqc-test-hdp3 ~]$
spark-submit --class OnlineDataPCA --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 500m --executor-cores 1 --queue default spark-pca.jar
关闭基于yarn的spark streaming程序
[root@hqc-test-hdp3 ~]# yarn application --list
19/11/08 09:47:03 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/11/08 09:47:03 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1:8050
19/11/08 09:47:04 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2:10200
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1564035532438_0044 OnlineDataPCA SPARK hdfs default RUNNING UNDEFINED 10% http://xx:28929
[root@hqc-test-hdp3 ~]# yarn application -kill application_1564035532438_0044
19/11/08 09:47:21 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/11/08 09:47:21 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1:8050
19/11/08 09:47:21 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2:10200
Killing application application_1564035532438_0044
19/11/08 09:47:22 INFO impl.YarnClientImpl: Killed application application_1564035532438_0044
[root@hqc-test-hdp3 ~]# yarn application --list
19/11/08 09:47:29 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/11/08 09:47:30 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1:8050
19/11/08 09:47:30 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2:10200
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
[root@hqc-test-hdp3 ~]#
完整的pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xx</groupId>
<artifactId>spark-pca</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<!-- <scala.version>2.11.8</scala.version>-->
<spark.version>2.0.0</spark.version>
</properties>
<!-- <repositories>
<repository>
<id>nexus-aliyun</id>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</repository>
</repositories>-->
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.10.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.3</version>
<configuration>
<classifier>dist</classifier>
<appendAssemblyId>true</appendAssemblyId>
<descriptorRefs>
<descriptor>jar-with-dependencies</descriptor>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>