十分钟sparkstreaming简单入门测试(2018-04-19)

      • idea编辑器安装
      • 下载spark安装包和scala的安装包
      • 添加环境变量
      • idea中新建项目
      • 启动nc -lk
      • 启动流计算程序

idea编辑器安装

参考笔记:http://www.aboutyun.com/thread-22320-1-1.html

十分钟sparkstreaming简单入门测试(2018-04-19)_第1张图片

给大家准备了资料包:
下载地址:
https://pan.baidu.com/s/1auAjP4npWDD0oGfNjXlySQ

十分钟sparkstreaming简单入门测试(2018-04-19)_第2张图片

下载spark安装包和scala的安装包

http://mirrors.hust.edu.cn/apache/spark/spark-1.6.3/

十分钟sparkstreaming简单入门测试(2018-04-19)_第3张图片

https://www.scala-lang.org/download/all.html

十分钟sparkstreaming简单入门测试(2018-04-19)_第4张图片

添加环境变量

十分钟sparkstreaming简单入门测试(2018-04-19)_第5张图片

我的地址是D:\linghit\lingghit_soft\spark-2.2.0-bin-hadoop2.6\spark-2.2.0-bin-hadoop2.6

十分钟sparkstreaming简单入门测试(2018-04-19)_第6张图片

我的地址是:D:\linghit\lingghit_soft\scala-2.11.8\scala-2.11.8

最后Path里面也要添加

十分钟sparkstreaming简单入门测试(2018-04-19)_第7张图片

idea中新建项目

十分钟sparkstreaming简单入门测试(2018-04-19)_第8张图片

十分钟sparkstreaming简单入门测试(2018-04-19)_第9张图片

十分钟sparkstreaming简单入门测试(2018-04-19)_第10张图片

十分钟sparkstreaming简单入门测试(2018-04-19)_第11张图片

十分钟sparkstreaming简单入门测试(2018-04-19)_第12张图片

添加:

<spark.version>1.6.3spark.version>
<hadoop.version>2.6.4hadoop.version>


<dependency>
  <groupId>org.apache.sparkgroupId>
  <artifactId>spark-core_2.11artifactId>
  <version>${spark.version}version>
dependency>

<dependency>
  <groupId>org.apache.sparkgroupId>
  <artifactId>spark-streaming_2.11artifactId>
  <version>${spark.version}version>
dependency>

最终的pom.xml文件是这样的

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0modelVersion>
  <groupId>com.linghit.sparklearninggroupId>
  <artifactId>MySparkartifactId>
  <version>1.1-SNAPSHOTversion>
  <inceptionYear>2008inceptionYear>
  <properties>
    <scala.version>2.11.8scala.version>
    <spark.version>1.6.3spark.version>
    <hadoop.version>2.6.4hadoop.version>

  properties>

  <repositories>
    <repository>
      <id>scala-tools.orgid>
      <name>Scala-Tools Maven2 Repositoryname>
      <url>http://scala-tools.org/repo-releasesurl>
    repository>
  repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.orgid>
      <name>Scala-Tools Maven2 Repositoryname>
      <url>http://scala-tools.org/repo-releasesurl>
    pluginRepository>
  pluginRepositories>

  <dependencies>
    <dependency>
      <groupId>org.scala-langgroupId>
      <artifactId>scala-libraryartifactId>
      <version>${scala.version}version>
    dependency>
    <dependency>
      <groupId>junitgroupId>
      <artifactId>junitartifactId>
      <version>4.4version>
      <scope>testscope>
    dependency>
    <dependency>
      <groupId>org.specsgroupId>
      <artifactId>specsartifactId>
      <version>1.2.5version>
      <scope>testscope>
    dependency>

    <dependency>
      <groupId>org.apache.sparkgroupId>
      <artifactId>spark-core_2.11artifactId>
      <version>${spark.version}version>
    dependency>

    <dependency>
      <groupId>org.apache.sparkgroupId>
      <artifactId>spark-streaming_2.11artifactId>
      <version>${spark.version}version>
    dependency>

  dependencies>

  <build>
    <sourceDirectory>src/main/scalasourceDirectory>
    <testSourceDirectory>src/test/scalatestSourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-toolsgroupId>
        <artifactId>maven-scala-pluginartifactId>
        <executions>
          <execution>
            <goals>
              <goal>compilegoal>
              <goal>testCompilegoal>
            goals>
          execution>
        executions>
        <configuration>
          <scalaVersion>${scala.version}scalaVersion>
          <args>
            <arg>-target:jvm-1.5arg>
          args>
        configuration>
      plugin>
      <plugin>
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-eclipse-pluginartifactId>
        <configuration>
          <downloadSources>truedownloadSources>
          <buildcommands>
            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilderbuildcommand>
          buildcommands>
          <additionalProjectnatures>
            <projectnature>ch.epfl.lamp.sdt.core.scalanatureprojectnature>
          additionalProjectnatures>
          <classpathContainers>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINERclasspathContainer>
            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINERclasspathContainer>
          classpathContainers>
        configuration>
      plugin>
    plugins>
  build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.scala-toolsgroupId>
        <artifactId>maven-scala-pluginartifactId>
        <configuration>
          <scalaVersion>${scala.version}scalaVersion>
        configuration>
      plugin>
    plugins>
  reporting>
project>

十分钟sparkstreaming简单入门测试(2018-04-19)_第13张图片

启动nc -lk

import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.slf4j.LoggerFactory

/**
  * Created by Administrator on 2018/4/19.
  */
object TestNetworkWordCount {
  def main(args: Array[String]): Unit = {
    var logger = LoggerFactory.getLogger(TestNetworkWordCount.getClass)
    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
    val conf =  new SparkConf().setMaster("local[2]").setAppName("wordcountonline")
    val ssc  = new StreamingContext(conf,Seconds(5))
    val lines = ssc.socketTextStream("120.27.251.175", 9991)
    //按空格分割
    val words = lines.flatMap { line => line.split(" ") }
    //把单个的word变成tuple
    val wordCount  = words.map { word => (word,1) }
    /**
      * (key1,1) (key1,1)
      * key相同的累加。
      */
    wordCount.reduceByKey(_+_)
    wordCount.print()
    ssc.start()
    /**
      * 等待程序结束
      */
    ssc.awaitTermination()
    ssc.stop(true)

  }

}

启动流计算程序

十分钟sparkstreaming简单入门测试(2018-04-19)_第14张图片


你可能感兴趣的:(spark)