Spark Streaming 2.0 读取Kafka 0.10 小例子

环境版本: Scala 2.11.8; Kafka 0.10.0.1; Spark 2.0.0

如需Scala 2.10.5; Spark 1.6.0; Kafka 0.10.0.1版本请看这篇:Flume+Kakfa+Spark Streaming整合

import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
 * 打印从kafka中消费的信息
 * Created by drguo on 2019/11/8 11:16.
 */
object Test {

  private val brokers = "hqc-test-hdp1:6667,hqc-test-hdp2:6667,hqc-test-hdp3:6667"

  def main(args: Array[String]): Unit = {
    // 两种方式都可
//    val sparkSession = getSparkSession()
//    val sc = sparkSession.sparkContext
    
    val sc = new SparkConf()
      // 本地模式,* 自动检测cpu核心,占满
      .setMaster("local[*]")
      .setAppName("DirectKafkaTest")
    val ssc = new StreamingContext(sc, Seconds(6))
    val topics = Array("test")
    val kafkaParams = Map[String, Object](
      "bootstrap.servers" -> brokers,
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> "hqc",
      "auto.offset.reset" -> "latest",
      "enable.auto.commit" -> (false: java.lang.Boolean)
    )
    val messages = KafkaUtils.createDirectStream[String, String](
      ssc,
      PreferConsistent,
      Subscribe[String, String](topics, kafkaParams)
    )
    val lines = messages.map(_.value)
    lines.print()
    ssc.start()
    ssc.awaitTermination()
  }

  def getSparkSession(): SparkSession = {
    val sparkSession = SparkSession
      .builder()
      .appName("test")
      .master("local[*]")
      .getOrCreate()
    sparkSession
  }
}

在本地运行时卡在
19/11/08 11:31:30 INFO AbstractCoordinator: Discovered coordinator hqc-test-hdp2:6667 (id: 2147482646 rack: null) for group spark-executor-hqc.
然后maven install打包提交到集群运行,没有问题
结果本地测试了一下,莫名其妙也没问题了,可能是因为依赖的jar包没下全,pom.xml加入下面插件,install时下载打包了所有依赖

  <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.3</version>
                <configuration>
                    <classifier>dist</classifier>
                    <appendAssemblyId>true</appendAssemblyId>
                    <descriptorRefs>
                        <descriptor>jar-with-dependencies</descriptor>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

提交程序

[hdfs@hqc-test-hdp3 ~]$ ls
mysql_kafka_hive.conf  spark-pca-1.0-SNAPSHOT.jar  spark-pca-1.0-SNAPSHOT-jar-with-dependencies.jar  spark-pca.jar
[hdfs@hqc-test-hdp3 ~]$ spark-submit --class OnlineDataPCA --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 500m --executor-cores 1 --queue default spark-pca-1.0-SNAPSHOT-jar-with-dependencies.jar
[hdfs@hqc-test-hdp3 ~]$ 
spark-submit --class OnlineDataPCA --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 500m --executor-cores 1 --queue default spark-pca.jar

关闭基于yarn的spark streaming程序

[root@hqc-test-hdp3 ~]# yarn application --list
19/11/08 09:47:03 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/11/08 09:47:03 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1:8050
19/11/08 09:47:04 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2:10200
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State	       Final-State	       Progress	                       Tracking-URL
application_1564035532438_0044	       OnlineDataPCA	               SPARK	      hdfs	   default	           RUNNING	         UNDEFINED	            10%	            http://xx:28929
[root@hqc-test-hdp3 ~]# yarn application -kill application_1564035532438_0044
19/11/08 09:47:21 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/11/08 09:47:21 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1:8050
19/11/08 09:47:21 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2:10200
Killing application application_1564035532438_0044
19/11/08 09:47:22 INFO impl.YarnClientImpl: Killed application application_1564035532438_0044
[root@hqc-test-hdp3 ~]# yarn application --list
19/11/08 09:47:29 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
19/11/08 09:47:30 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1:8050
19/11/08 09:47:30 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2:10200
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State	       Final-State	       Progress	                       Tracking-URL
[root@hqc-test-hdp3 ~]# 

完整的pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.xx</groupId>
    <artifactId>spark-pca</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
<!--        <scala.version>2.11.8</scala.version>-->
        <spark.version>2.0.0</spark.version>
    </properties>

<!--    <repositories>
        <repository>
            <id>nexus-aliyun</id>
            <name>Nexus aliyun</name>
            <url>http://maven.aliyun.com/nexus/content/groups/public</url>
        </repository>
    </repositories>-->

    <dependencies>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>0.10.0.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>compile</scope>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.3</version>
                <configuration>
                    <classifier>dist</classifier>
                    <appendAssemblyId>true</appendAssemblyId>
                    <descriptorRefs>
                        <descriptor>jar-with-dependencies</descriptor>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

你可能感兴趣的:(大数据动物园,Spark)