Spark Streaming整合flume(一)push 方式

一:本地测试streaming整合flume
1)在集群服务器master中的apache-flume/conf中添加 flume-push-streaming.conf配置文件

# Name the components on this agent
simple-agent.sources = netcat-source
simple-agent.sinks = avro-sink
simple-agent.channels = memory-channel

simple-agent.sources.netcat-source.type = netcat
#这个地址是你要发送数据的地址  telnet localhost 44444(启动这个,发送你的数据)
simple-agent.sources.netcat-source.bind = localhost
simple-agent.sources.netcat-source.port = 44444

simple-agent.sinks.avro-sink.type = avro
#这里的地址你的本地idea客户端的地址,然后在你的本地idea代码中的地址写0.0.0.0(表示本地)
simple-agent.sinks.avro-sink.hostname=192.168.1.125
simple-agent.sinks.avro-sink.port=41414

simple-agent.channels.memory-channel.type = memory

simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channel

2)添加依赖 pom.xml



    org.apache.spark
    spark-streaming-flume_2.11
    2.0.2

3)本地idea程序中的 FlumePushWC.scala

package com.streaming.flume

import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Spark Streaming整合Flume的第一种方式
  */
object FlumePushWordCount {

  def main(args: Array[String]): Unit = {
//这里要打开的话,就在idea工具中配置参数
//    if(args.length != 2) {
//      System.err.println("Usage: FlumePushWordCount  ")
//      System.exit(1)
//    }
//
//    val Array(hostname, port) = args

    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("FlumePushWordCount")
    val ssc = new StreamingContext(sparkConf, Seconds(5))

    //TODO... 如何使用SparkStreaming整合Flume
    val flumeStream = FlumeUtils.createStream(ssc, "0.0.0.0", 41414)

    flumeStream.map(x=> new String(x.event.getBody.array()).trim)
      .flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()

    ssc.start()
    ssc.awaitTermination()
  }
}

(4)本地测试,先运行idea程序

(5)启动flume

bin/flume-ng agent --conf conf --conf-file ./conf/flume-push-streaming.conf --name simple-agent -Dflume.root.logger=INFO,console

(6)在集群master机器中输入:然后输入数据,本地idea观察程序

telnet localhost 44444

二:集群上测试streaming整合flume

(1)将本地程序打包
win下进入这个工程目录

mvn clean package -DskipTests
package com.streaming.flume

import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Spark Streaming整合Flume的第一种方式
  */
object FlumePushWordCount {

  def main(args: Array[String]): Unit = {

    if(args.length != 2) {
      System.err.println("Usage: FlumePushWordCount  ")
      System.exit(1)
    }

    val Array(hostname, port) = args

    val sparkConf = new SparkConf()//.setMaster("local[2]").setAppName("FlumePushWordCount")
    val ssc = new StreamingContext(sparkConf, Seconds(5))

    //TODO... 如何使用SparkStreaming整合Flume
    val flumeStream = FlumeUtils.createStream(ssc, hostname, port.toInt)

    flumeStream.map(x=> new String(x.event.getBody.array()).trim)
      .flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()

    ssc.start()
    ssc.awaitTermination()
  }
}

(2)提交本地的jar到集群(犹豫提交集群只能提交代码,里面spark-streaming-flume.jar报没有,所以两种方式提交)
第一种方式

spark-submit \
--class com.streaming.flume.FlumePushWordCount \
--master local[2] \
--packages org.apache.spark:spark-streaming-flume_2.11:2.0.2 \
/home/wl/miooc/streaming/flume/SparkStreaingTest-1.0.jar
localhost 41414

第二种方式:
本地下载好spark-streaming-flume.jar

spark-submit \
--class com.streaming.flume.FlumePushWordCount \
--master local[2] \
--jars /usr/local/src/spark-hadoop2.0.2/jars/spark-streaming-flume-assembly_2.11-2.2.0.jar \
/home/wl/miooc/streaming/flume/SparkStreaingTest-1.0.jar
localhost 41414

(3)master下启动flume

bin/flume-ng agent --conf conf --conf-file ./conf/flume-push-streaming.conf --name simple-agent -Dflume.root.logger=INFO,console
# Name the components on this agent
simple-agent.sources = netcat-source
simple-agent.sinks = avro-sink
simple-agent.channels = memory-channel

simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = localhost
simple-agent.sources.netcat-source.port = 44444

simple-agent.sinks.avro-sink.type = avro
simple-agent.sinks.avro-sink.hostname=localhost
simple-agent.sinks.avro-sink.port=41414

simple-agent.channels.memory-channel.type = memory

simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channel

(4)在master中启动,输入数据
telnet localhost 44444

你可能感兴趣的:(spark)