spark streaming程序WordCount运行过程

一、主要过程:

   首先安装好spark,启动spark服务;启动nc服务,nc端也将一直处于等待状态;在启动程序,启动后程序一直处于运行状态,运行的间隔时间可以在程序中设置;然后在nc端输入数据,在程序运行段,进行结果的显示。

二、具体过程:

1.启动spark:

          [root@hadoop11 sbin]# ./start-all.sh 

2.查看master和worker的状态:

          访问网址:http://hadoop11:8080/

spark streaming程序WordCount运行过程_第1张图片

3.启动nc服务

         [root@hadoop11 ~]# nc -lk 9999

注意:(9999为程序中设置的端口号

4.启动程序服务:

        [root@hadoop11 bin]# ./run-example streaming.NetworkWordCount localhost 9999

以下为启动程序的日志:没出现一次时间,表明程序执行一次
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/07 10:37:09 INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
18/11/07 10:37:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-------------------------------------------                                     
Time: 1541558250000 ms(程序已经执行一次,后面会根据设置的时间间隔,不断的出现)

-------------------------------------------

-------------------------------------------
Time: 1541558251000 ms
-------------------------------------------

 

5.在nc端,进行数据的输入:

注意:输入命令后,将一直处于等待输入的状态,在线直接输入,想输入的数据,该程序以空格为切割符

[root@hadoop11 ~]# nc -lk 9999

yang xiao hai big data

yang xiao hai big data

yang xiao hai big data

6.程序执行端,出现的结果:

[root@hadoop11 bin]# ./run-example streaming.NetworkWordCount localhost 9999
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/07 10:37:09 INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
18/11/07 10:37:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-------------------------------------------                                     
Time: 1541558250000 ms(程序已经执行一次,后面会根据设置的时间间隔,不断的出现)

-------------------------------------------

。。。。

。。。。

。。。。

。。。。

-------------------------------------------
Time: 1541559650000 ms
-------------------------------------------

(运行的结果)

(big,3)
(data,3)
(hai,3)
(xiao,3)
(yang,3)

-------------------------------------------
Time: 1541559651000 ms
-------------------------------------------

7.程序运行成功,在spark的页面也可以查看。

8.注意,如果是多次运行程序,可能会出现一个问题:

在nc输入数据,但程序端没有结果产生,在程序运行段出现:Initial job has not accepted any resources

产生的原因是,上一次的程序还在运行,导致本次提交的任务,一直处于waiting的状态,所以接收不到数据。

[root@hadoop11 bin]# ./run-example streaming.NetworkWordCount localhost 9999
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/11/07 11:09:26 INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
18/11/07 11:09:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/11/07 11:09:28 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[Stage 0:>                                                         (0 + 0) / 50]18/11/07 11:09:48 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

(异常的提示)
18/11/07 11:10:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/11/07 11:10:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

spark的页面程序显示:上一次的程序还在运行,本次任务处于awaiting状态。

spark streaming程序WordCount运行过程_第2张图片

解决方法:kill真正运行的程序:重新提交任务

[root@hadoop11 bin]# jps
2433 QuorumPeerMain
13604 Jps
13512 SparkSubmit
8217 Master
13417 CoarseGrainedExecutorBackend
8299 Worker
2476 Kafka
13356 SparkSubmit

[root@hadoop11 bin]# kill -9 13512 13356 

重新执行程序:

[root@hadoop11 bin]# ./run-example streaming.NetworkWordCount localhost 9999

程序代码:

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Created by YangXiaohai 
  */
object NetWorkWorkCount{

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setMaster("local[2]").setAppName("NetWorkWorkCount")
    val ssc = new StreamingContext(conf, Seconds(2))
    val lines = ssc.socketTextStream("localhost",9999)
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word =>(word,1))
    val wordCount = pairs.reduceByKey(_+_)
    wordCount.print()
    ssc.start()
    ssc.awaitTermination()
  }

}

 

你可能感兴趣的:(spark,streaming)