<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>此处需要修改(你的模型版本,原pom文件有)</modelVersion>
<groupId>此处需要修改(你的组ID,原pom文件有)</groupId>
<artifactId>此处需要修改(ID,原pom文件有)</artifactId>
<version>此处需要修改(版本,原pom文件有)</version>
<repositories>
<repository>
<id>Akka repository</id>
<url>http://repo.akka.io/releases</url>
</repository>
</repositories>
<build>
<sourceDirectory>src/main/scala/</sourceDirectory>
<testSourceDirectory>src/test/scala/</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.11.4</scalaVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
reference.conf
org.apache.maven.plugins
maven-compiler-plugin
1.6
org.apache.spark
spark-core_2.11
2.2.1
org.apache.spark
spark-hive_2.11
2.2.1
org.apache.hadoop
hadoop-client
2.7.2
org.apache.spark
spark-streaming_2.11
2.2.1
org.apache.spark
spark-sql_2.11
2.2.1
org.apache.hive
hive-exec
1.2.1
org.apache.hive
hive-jdbc
1.2.1
redis.clients
jedis
2.2.1
jar
compile
org.apache.hbase
hbase-client
1.2.1
org.apache.hbase
hbase-common
1.2.1
org.apache.kafka
kafka-clients
0.8.2.2
mysql
mysql-connector-java
5.1.37
org.apache.kafka
kafka_2.11
0.8.2.2
org.apache.spark
spark-mllib_2.11
2.2.1
新建setting.xml 复制下方的代码(第11行需要更改)
<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<pluginGroups />
<proxies />
<servers />
<!-- maven自动下载的jar包,会存放到该目录下 -->
<localRepository>此处需要修改(你想把下载的包放在哪?)</localRepository>
<mirrors>
<mirror>
<id>alimaven</id>
<mirrorOf>central</mirrorOf>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/repositories/central/</url>
</mirror>
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>central</id>
<name>Maven Repository Switchboard</name>
<url>http://repo1.maven.org/maven2/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>repo2</id>
<mirrorOf>central</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://repo2.maven.org/maven2/</url>
</mirror>
<mirror>
<id>ibiblio</id>
<mirrorOf>central</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://mirrors.ibiblio.org/pub/mirrors/maven2/</url>
</mirror>
<mirror>
<id>jboss-public-repository-group</id>
<mirrorOf>central</mirrorOf>
<name>JBoss Public Repository Group</name>
<url>http://repository.jboss.org/nexus/content/groups/public</url>
</mirror>
<mirror>
<id>google-maven-central</id>
<name>Google Maven Central</name>
<url>https://maven-central.storage.googleapis.com
</url>
<mirrorOf>central</mirrorOf>
</mirror>
<!-- 中央仓库在中国的镜像 -->
<mirror>
<id>maven.net.cn</id>
<name>oneof the central mirrors in china</name>
<url>http://maven.net.cn/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
</mirrors>
</settings>
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
object wordCount {
def main(args: Array[String]): Unit = {
//设置日志输出级别
Logger.getLogger("org").setLevel(Level.WARN)
//配置RDD环境
val dataNow = new SimpleDateFormat("yyyy-MM-dd-HH:mm ").format(new Date)
val sparkconf = new SparkConf().setAppName(dataNow).setMaster("local[*]")
val sparkcontext = new SparkContext(sparkconf)
//读取文件
val filePath = "/Users/apple/IdeaProjects/SparkInBigData/src/main/scala/wordCount.txt"
val rdd1 = sparkcontext.textFile(filePath)
val counts = rdd1.flatMap(t => t.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _) //第n个数加第n+1个数
.sortBy(_._2, false) //按照第二个元素排序 降序
.collect().foreach(println) //collect收集、foreach循环、println输出
sparkcontext.stop()
}
}
数据源
Everyone has their own dreams I am the same But my
dream is not a lawyer not a doctor not actors not
even an industry Perhaps my dream big people will
find it ridiculous but this has been my pursuit
My dream is to want to have a folk life I want it
to become a beautiful painting it is not only sharp
colors but also the colors are bleak I do not rule
out the painting is part of the black but I will
treasure these bleak colors Not yet how about a
colorful painting if not bleak add color how can
it more prominent American Life is like painting
painting the bright red color represents life beautiful
happy moments Painting a bleak color represents life
difficult unpleasant time You may find a flat with
a beautiful road is not very good yet but I do not
think it will If a person lives flat then what is
the point Life is only a short few decades I want
it to go Finally Each memory is a solid
结果