Idea基于maven,java语言的spark环境搭建

环境介绍:IntelliJ IDEA开发软件,hadoop01-hadoop04的集群(如果不进行spark集群测试可不安装),其中spark安装目录为/opt/moudles/spark-1.6.1/

准备工作

首先在集群中的hdfs中添加a.txt文件,将来需在项目中进行单词统计
Idea基于maven,java语言的spark环境搭建_第1张图片

构建Maven项目

点击File->New->Project…
Idea基于maven,java语言的spark环境搭建_第2张图片
点击Next,其中GroupId和ArtifactId可随意命名
Idea基于maven,java语言的spark环境搭建_第3张图片
点击Next
Idea基于maven,java语言的spark环境搭建_第4张图片
点击Finish,出现如下界面:
Idea基于maven,java语言的spark环境搭建_第5张图片

书写wordCount代码

请在pom.xml中的version标签后追加如下配置

<properties>
    <project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
properties>
<dependencies>
    <dependency>
        <groupId>junitgroupId>
        <artifactId>junitartifactId>
        <version>3.8.1version>
        <scope>testscope>
    dependency>
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-core_2.10artifactId>
        <version>1.6.1version>
    dependency>
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-sql_2.10artifactId>
        <version>1.6.1version>
    dependency>
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-hive_2.10artifactId>
        <version>1.6.1version>
    dependency>
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-streaming_2.10artifactId>
        <version>1.6.1version>
    dependency>
    <dependency>
        <groupId>org.apache.hadoopgroupId>
        <artifactId>hadoop-clientartifactId>
        <version>2.7.1version>
    dependency>
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-streaming-kafka_2.10artifactId>
        <version>1.6.1version>
    dependency>
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-graphx_2.10artifactId>
        <version>1.6.1version>
    dependency>
    <dependency>
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-assembly-pluginartifactId>
        <version>2.2-beta-5version>
    dependency>
    <dependency>
        <groupId>commons-langgroupId>
        <artifactId>commons-langartifactId>
        <version>2.3version>
    dependency>
dependencies>
<build>
    <sourceDirectory>src/main/javasourceDirectory>
    <testSourceDirectory>src/test/javatestSourceDirectory>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-pluginartifactId>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependenciesdescriptorRef>
                descriptorRefs>
                <archive>
                    <manifest>
                        <maniClass>maniClass>
                    manifest>
                archive>
            configuration>
            <executions>
                <execution>
                    <id>make-assemblyid>
                    <phase>packagephase>
                    <goals>
                        <goal>singlegoal>
                    goals>
                execution>
            executions>
        plugin>
        <plugin>
            <groupId>org.codehaus.mojogroupId>
            <artifactId>exec-maven-pluginartifactId>
            <version>1.3.1version>
            <executions>
                <execution>
                    <goals>
                        <goal>execgoal>
                    goals>
                execution>
            executions>
            <configuration>
                <executable>javaexecutable>
                <includeProjectDependencies>falseincludeProjectDependencies>
                <classpathScope>compileclasspathScope>
                <mainClass>com.dt.spark.SparkApps.AppmainClass>
            configuration>
        plugin>
        <plugin>
            <groupId>org.apache.maven.pluginsgroupId>
            <artifactId>maven-compiler-pluginartifactId>


            <configuration>
                <source>1.6source>
                <target>1.6target>
            configuration>
        plugin>
    plugins>
build>

点击右下角的Import Changes导入相应的包
Idea基于maven,java语言的spark环境搭建_第6张图片
点击File->Project Structure…->Moudules,将src和main都选为Sources文件
Idea基于maven,java语言的spark环境搭建_第7张图片
在java文件夹下创建SparkWordCount java文件
Idea基于maven,java语言的spark环境搭建_第8张图片
该文件代码为:

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2;

import java.util.Arrays;

/**
 * Created by hadoop on 17-4-4.
 */
public class SparkWordCount {
    public static void main(String[] args){
        SparkConf conf = new SparkConf()
                .setAppName("WordCountCluster");
        //第二步
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD lines = sc.textFile("hdfs://hadoop01:9000/a.txt");
        JavaRDD words = lines.flatMap(new FlatMapFunction(){
            private static final long serialVersionUID = 1L;

            @Override
            public Iterable call(String line) throws Exception{
                return Arrays.asList(line.split(" "));
            }
        });


        JavaPairRDD pairs = words.mapToPair(
                new PairFunction() {

                    private  static final long serialVersionUID = 1L;

                    public Tuple2 call(String word) throws Exception {
                        return new Tuple2(word,1);
                    }
                }
        );

        JavaPairRDD wordCounts = pairs.reduceByKey(
                new Function2() {
                    @Override
                    public Integer call(Integer v1, Integer v2) throws Exception {
                        return v1+v2;
                    }
                }
        );


        wordCounts.foreach(new VoidFunction>() {
            @Override
            public void call(Tuple2 wordCount) throws Exception {
                System.out.println(wordCount._1+" : "+ wordCount._2 );
            }
        });

        sc.close();

    }
}

生成jar包

点击File->Project Structure…->Artifacts,点击+号
Idea基于maven,java语言的spark环境搭建_第9张图片
选择Main Class
Idea基于maven,java语言的spark环境搭建_第10张图片
点击ok
Idea基于maven,java语言的spark环境搭建_第11张图片
由于集群中已包含spark相关jar包,将那些依赖jar包删除
Idea基于maven,java语言的spark环境搭建_第12张图片
点击apply,ok。然后点击菜单栏中的Build->Build Artifacts…->Build,将会在out目录中生成相应的jar包
Idea基于maven,java语言的spark环境搭建_第13张图片

jar包上传到集群并执行

本文使用scp将jar包上传到集群,如果在windows下可使用filezilla或xftp软件来上传
这里写图片描述
在集群上输入如下命令来执行:

/opt/moudles/spark-1.6.1/bin/spark-submit --class SparkWordCount sparkStudy.jar  --master=spark://192.168.20.171:7077

最终结果为:
Idea基于maven,java语言的spark环境搭建_第14张图片

你可能感兴趣的:(Spark)