windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]

第一步:(1)我的电脑是windows10,64位的环境,可以去eclipse官网下载自己电脑对应的eclipse版本。

             (2)百度搜索eclispe,如图是我下载的eclipse版本,自动解压安装即可。

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第1张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第2张图片

第二步:下载spark和hadoop相关文件

           (1)下载和解压缩后的hadoop文件。

           hadoop文件下载网址:https://download.csdn.net/download/qq_30993409/10561014

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第3张图片

        (2)下载和解压缩后的spark文件(我选的是spark1.6版本),选择1.6版本主要是引入下面这个jar包方便。

                                                                    

       spark文件下载网址:https://pan.baidu.com/s/1WBrp-_boqwlPLNJST9yD_Q   密码:obo3

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第4张图片

第三步:配置相应的环境变量,如下图。

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第5张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第6张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第7张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第8张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第9张图片

第四步:windowsx的dos命令行下进行测试

(1)SparkPi测试

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第10张图片

(2)spark-shell测试

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第11张图片

第五步:eclipse下编写wordcount程序并且执行代码

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第12张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第13张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第14张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第15张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第16张图片

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第17张图片

对pom.xml文件进行编辑:


  4.0.0
  cn.spark
  spark-study-java
  0.0.1-SNAPSHOT
  spark-study-java
  jar
  http://maven.apache.org

  
    	UTF-8
  

  
	    
		      junit
		      junit
		      3.8.1
		      test
	    
	    
	    
		      org.apache.spark
		      spark-core_2.10
		      1.6.0
	    
	
	     
		      org.apache.spark
		      spark-launcher_2.10
		      1.6.0
	    
	
	     
		      org.apache.spark
		      spark-sql_2.10
		      1.6.0
	        
  

 

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第18张图片

Java代码:

package cn.spark.study.java;

import java.util.Arrays;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;

import scala.Tuple2;

public class WordCountLocal {
 
	public static void main(String[] args) {
		SparkConf conf = new SparkConf()
				.setAppName("wordCountLocal")
				.setMaster("local");
		JavaSparkContext sc = new JavaSparkContext(conf);
		JavaRDD lines = sc.textFile("C://Users//Yuan//Desktop//spark.txt");
		JavaRDD words = lines.flatMap(new FlatMapFunction() {
			private static final long serialVersionUID = 1L;		
			public Iterable call(String line) throws Exception {
				return Arrays.asList(line.split(" "));
			}
		});
		JavaPairRDD parirs = words.mapToPair(new PairFunction() {
			private static final long serialVersionUID = 1L;		
			public Tuple2 call(String word) throws Exception {
				return new Tuple2(word, 1);
			}
		});
		JavaPairRDD wordcount = parirs.reduceByKey(new Function2() {
			private static final long serialVersionUID = 1L;
			public Integer call(Integer valA, Integer valB) throws Exception {
				return valA+valB;
			}
		});
		wordcount.foreach(new VoidFunction>() {	
			private static final long serialVersionUID = 1L;
			public void call(Tuple2 wordCount) throws Exception {
				System.out.println("["+wordCount._1+","+wordCount._2+"]");
			}
		});
		sc.close();
	}
}

 

运行程序结果如下:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/Yuan/Desktop/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/Yuan/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/07/24 14:53:06 INFO SparkContext: Running Spark version 1.6.0
18/07/24 14:53:07 INFO SecurityManager: Changing view acls to: Yuan
18/07/24 14:53:07 INFO SecurityManager: Changing modify acls to: Yuan
18/07/24 14:53:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Yuan); users with modify permissions: Set(Yuan)
18/07/24 14:53:09 INFO Utils: Successfully started service 'sparkDriver' on port 54107.
18/07/24 14:53:11 INFO Slf4jLogger: Slf4jLogger started
18/07/24 14:53:12 INFO Remoting: Starting remoting
18/07/24 14:53:12 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:54120]
18/07/24 14:53:12 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 54120.
18/07/24 14:53:12 INFO SparkEnv: Registering MapOutputTracker
18/07/24 14:53:12 INFO SparkEnv: Registering BlockManagerMaster
18/07/24 14:53:12 INFO DiskBlockManager: Created local directory at C:\Users\Yuan\AppData\Local\Temp\blockmgr-93c9430a-27aa-418c-a75c-19d9a946866f
18/07/24 14:53:13 INFO MemoryStore: MemoryStore started with capacity 444.4 MB
18/07/24 14:53:13 INFO SparkEnv: Registering OutputCommitCoordinator
18/07/24 14:53:14 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/07/24 14:53:14 INFO SparkUI: Started SparkUI at http://169.254.210.125:4040
18/07/24 14:53:14 INFO Executor: Starting executor ID driver on host localhost
18/07/24 14:53:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54127.
18/07/24 14:53:14 INFO NettyBlockTransferService: Server created on 54127
18/07/24 14:53:14 INFO BlockManagerMaster: Trying to register BlockManager
18/07/24 14:53:14 INFO BlockManagerMasterEndpoint: Registering block manager localhost:54127 with 444.4 MB RAM, BlockManagerId(driver, localhost, 54127)
18/07/24 14:53:14 INFO BlockManagerMaster: Registered BlockManager
18/07/24 14:53:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 127.4 KB, free 127.4 KB)
18/07/24 14:53:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 141.3 KB)
18/07/24 14:53:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:54127 (size: 13.9 KB, free: 444.4 MB)
18/07/24 14:53:17 INFO SparkContext: Created broadcast 0 from textFile at WordCountLocal.java:23
18/07/24 14:53:19 WARN : Your hostname, DESKTOP-359QINH resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:a9fe:82e%net10, but we couldn't find any external IP address!
18/07/24 14:53:21 INFO FileInputFormat: Total input paths to process : 1
18/07/24 14:53:21 INFO SparkContext: Starting job: foreach at WordCountLocal.java:42
18/07/24 14:53:21 INFO DAGScheduler: Registering RDD 3 (mapToPair at WordCountLocal.java:30)
18/07/24 14:53:21 INFO DAGScheduler: Got job 0 (foreach at WordCountLocal.java:42) with 1 output partitions
18/07/24 14:53:21 INFO DAGScheduler: Final stage: ResultStage 1 (foreach at WordCountLocal.java:42)
18/07/24 14:53:21 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/07/24 14:53:21 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/07/24 14:53:21 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at WordCountLocal.java:30), which has no missing parents
18/07/24 14:53:22 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 146.1 KB)
18/07/24 14:53:22 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 148.7 KB)
18/07/24 14:53:22 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:54127 (size: 2.6 KB, free: 444.4 MB)
18/07/24 14:53:22 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/07/24 14:53:22 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at mapToPair at WordCountLocal.java:30)
18/07/24 14:53:22 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
18/07/24 14:53:22 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2128 bytes)
18/07/24 14:53:22 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/07/24 14:53:22 INFO HadoopRDD: Input split: file:/C:/Users/Yuan/Desktop/spark.txt:0+171
18/07/24 14:53:22 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/07/24 14:53:22 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/07/24 14:53:22 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/07/24 14:53:22 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/07/24 14:53:22 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/07/24 14:53:23 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2253 bytes result sent to driver
18/07/24 14:53:23 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 993 ms on localhost (1/1)
18/07/24 14:53:23 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18/07/24 14:53:23 INFO DAGScheduler: ShuffleMapStage 0 (mapToPair at WordCountLocal.java:30) finished in 1.105 s
18/07/24 14:53:23 INFO DAGScheduler: looking for newly runnable stages
18/07/24 14:53:23 INFO DAGScheduler: running: Set()
18/07/24 14:53:23 INFO DAGScheduler: waiting: Set(ResultStage 1)
18/07/24 14:53:23 INFO DAGScheduler: failed: Set()
18/07/24 14:53:23 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.java:36), which has no missing parents
18/07/24 14:53:23 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 151.7 KB)
18/07/24 14:53:23 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1785.0 B, free 153.4 KB)
18/07/24 14:53:23 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:54127 (size: 1785.0 B, free: 444.4 MB)
18/07/24 14:53:23 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/07/24 14:53:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountLocal.java:36)
18/07/24 14:53:23 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
18/07/24 14:53:23 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1894 bytes)
18/07/24 14:53:23 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
18/07/24 14:53:23 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
18/07/24 14:53:23 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 4 ms
[spark,7]
[hive,5]
[hadoop,6]
[core,2]
[streaming,1]
[sql,3]
[hbase,4]
18/07/24 14:53:23 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1165 bytes result sent to driver
18/07/24 14:53:23 INFO DAGScheduler: ResultStage 1 (foreach at WordCountLocal.java:42) finished in 0.404 s
18/07/24 14:53:23 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 404 ms on localhost (1/1)
18/07/24 14:53:23 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
18/07/24 14:53:23 INFO DAGScheduler: Job 0 finished: foreach at WordCountLocal.java:42, took 2.197408 s
18/07/24 14:53:24 INFO SparkUI: Stopped Spark web UI at http://169.254.210.125:4040
18/07/24 14:53:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/07/24 14:53:24 INFO MemoryStore: MemoryStore cleared
18/07/24 14:53:24 INFO BlockManager: BlockManager stopped
18/07/24 14:53:24 INFO BlockManagerMaster: BlockManagerMaster stopped
18/07/24 14:53:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/07/24 14:53:24 INFO SparkContext: Successfully stopped SparkContext
18/07/24 14:53:24 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
18/07/24 14:53:24 INFO ShutdownHookManager: Shutdown hook called
18/07/24 14:53:24 INFO ShutdownHookManager: Deleting directory C:\Users\Yuan\AppData\Local\Temp\spark-0540df48-4e2a-44b9-adf6-be22f4e361f4
18/07/24 14:53:24 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

运行结果中,如下打印即为词频统计结果:

[spark,7]
[hive,5]
[hadoop,6]
[core,2]
[streaming,1]
[sql,3]
[hbase,4]

windows下eclipse中配置spark单机环境[spark+hadoop+maven+eclipse]_第19张图片

你可能感兴趣的:(spark单机环境搭建)