一:本地测试streaming整合flume
1)在集群服务器master中的apache-flume/conf中添加 flume-push-streaming.conf配置文件
# Name the components on this agent
simple-agent.sources = netcat-source
simple-agent.sinks = avro-sink
simple-agent.channels = memory-channel
simple-agent.sources.netcat-source.type = netcat
#这个地址是你要发送数据的地址 telnet localhost 44444(启动这个,发送你的数据)
simple-agent.sources.netcat-source.bind = localhost
simple-agent.sources.netcat-source.port = 44444
simple-agent.sinks.avro-sink.type = avro
#这里的地址你的本地idea客户端的地址,然后在你的本地idea代码中的地址写0.0.0.0(表示本地)
simple-agent.sinks.avro-sink.hostname=192.168.1.125
simple-agent.sinks.avro-sink.port=41414
simple-agent.channels.memory-channel.type = memory
simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channel
2)添加依赖 pom.xml
org.apache.spark
spark-streaming-flume_2.11
2.0.2
3)本地idea程序中的 FlumePushWC.scala
package com.streaming.flume
import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* Spark Streaming整合Flume的第一种方式
*/
object FlumePushWordCount {
def main(args: Array[String]): Unit = {
//这里要打开的话,就在idea工具中配置参数
// if(args.length != 2) {
// System.err.println("Usage: FlumePushWordCount ")
// System.exit(1)
// }
//
// val Array(hostname, port) = args
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("FlumePushWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(5))
//TODO... 如何使用SparkStreaming整合Flume
val flumeStream = FlumeUtils.createStream(ssc, "0.0.0.0", 41414)
flumeStream.map(x=> new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}
(4)本地测试,先运行idea程序
(5)启动flume
bin/flume-ng agent --conf conf --conf-file ./conf/flume-push-streaming.conf --name simple-agent -Dflume.root.logger=INFO,console
(6)在集群master机器中输入:然后输入数据,本地idea观察程序
telnet localhost 44444
二:集群上测试streaming整合flume
(1)将本地程序打包
win下进入这个工程目录
mvn clean package -DskipTests
package com.streaming.flume
import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* Spark Streaming整合Flume的第一种方式
*/
object FlumePushWordCount {
def main(args: Array[String]): Unit = {
if(args.length != 2) {
System.err.println("Usage: FlumePushWordCount ")
System.exit(1)
}
val Array(hostname, port) = args
val sparkConf = new SparkConf()//.setMaster("local[2]").setAppName("FlumePushWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(5))
//TODO... 如何使用SparkStreaming整合Flume
val flumeStream = FlumeUtils.createStream(ssc, hostname, port.toInt)
flumeStream.map(x=> new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}
(2)提交本地的jar到集群(犹豫提交集群只能提交代码,里面spark-streaming-flume.jar报没有,所以两种方式提交)
第一种方式
spark-submit \
--class com.streaming.flume.FlumePushWordCount \
--master local[2] \
--packages org.apache.spark:spark-streaming-flume_2.11:2.0.2 \
/home/wl/miooc/streaming/flume/SparkStreaingTest-1.0.jar
localhost 41414
第二种方式:
本地下载好spark-streaming-flume.jar
spark-submit \
--class com.streaming.flume.FlumePushWordCount \
--master local[2] \
--jars /usr/local/src/spark-hadoop2.0.2/jars/spark-streaming-flume-assembly_2.11-2.2.0.jar \
/home/wl/miooc/streaming/flume/SparkStreaingTest-1.0.jar
localhost 41414
(3)master下启动flume
bin/flume-ng agent --conf conf --conf-file ./conf/flume-push-streaming.conf --name simple-agent -Dflume.root.logger=INFO,console
# Name the components on this agent
simple-agent.sources = netcat-source
simple-agent.sinks = avro-sink
simple-agent.channels = memory-channel
simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = localhost
simple-agent.sources.netcat-source.port = 44444
simple-agent.sinks.avro-sink.type = avro
simple-agent.sinks.avro-sink.hostname=localhost
simple-agent.sinks.avro-sink.port=41414
simple-agent.channels.memory-channel.type = memory
simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channel
(4)在master中启动,输入数据
telnet localhost 44444