[root@node01 sparkdata]# jps
1954 QuorumPeerMain
6388 Jps
2981 NameNode
4424 SparkSubmit
3273 SecondaryNameNode
3865 Master
3116 DataNode
[root@node02 spark-2.2.0-bin-hadoop2.7]# jps
1958 QuorumPeerMain
2376 DataNode
3289 Jps
2623 Worker
[root@node03 ~]# jps
2581 DataNode
2828 Worker
1948 QuorumPeerMain
3406 Jps
要先启动zk集群和hdfs集群
node01: cd /export/servers/spark-2.2.0-bin-hadoop2.7/sbin
node01: ./start-all.sh
然后jps就会看到master worker worker三个进程
http://node01:8080/
停止spark集群
node01: cd /export/servers/spark-2.2.0-bin-hadoop2.7/sbin
node01: ./stop-all.sh
【三台服务器启动zookeeper】,三台机器都执行以下命令启动zookeeper
cd /export/servers/zookeeper-3.4.5-cdh5.14.0
bin/zkServer.sh start
进程QuorumPeerMain
【启动集群】
node01节点上执行以下命令
第一台机器执行以下命令
cd /export/servers/hadoop-2.6.0-cdh5.14.0/
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
#多个master 高可用情况下
cd /export/servers/spark-2.2.0-bin-hadoop2.7
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://node01:7077,node02:7077,node03:7077 \
--executor-memory 1G \
--total-executor-cores 2 examples/jars/spark-examples_2.11-2.2.0.jar 10
可以 ./bin/spark-shell
不过一般用spark-shell --master local[2]
5289 SparkSubmit
scala> sc.textFile("file:///export/servers/sparkdata/word.txt").flatMap(x=>x.split(" ")).map(x=>(x,1)).reduceByKey((x,y)=>x+y).collect
res1: Array[(String, Int)] = Array((p,1), (hive,1), (s,1), (spark,1), (hadoop,1), (a,1), (hbase,1))
scala> sc.textFile("hdfs://node01.hadoop.com:8020/sparkdata/word.txt").flatMap(x=>x.split(" ")).map(x=>(x,1)).reduceByKey((x,y)=>x+y).collect
res7: Array[(String, Int)] = Array((p,1), (hive,1), (s,1), (spark,1), (hadoop,1), (a,1), (hbase,1))
该例在spark中配置过hadoop的东东
该文件夹不能存在
[root@node01 sparkdata]# hdfs dfs -put word.txt /sparkdata/input
[root@node01 sparkdata]# hdfs dfs -cat /sparkdata/input/word.txt
spark hadoop hbase
hadoop spark
hello
#不指定文件夹
scala> sc.textFile("/sparkdata/input/word.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
res8: Array[(String, Int)] = Array((hello,1), (spark,2), (hadoop,2), (hbase,1))
#指定
scala> sc.textFile("/sparkdata/input/word.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("/sparkdata/output")
[root@node01 sparkdata]# hdfs dfs -ls /sparkdata/output
Found 3 items
-rw-r--r-- 3 root supergroup 0 2000-00-00 09:38 /sparkdata/output/_SUCCESS
-rw-r--r-- 3 root supergroup 10 2000-00-00 09:38 /sparkdata/output/part-00000
-rw-r--r-- 3 root supergroup 31 2000-00-00 09:38 /sparkdata/output/part-00001
[root@node01 sparkdata]# hdfs dfs -cat /sparkdata/output/part-00000
(hello,1)
[root@node01 sparkdata]# hdfs dfs -cat /sparkdata/output/part-00001
(spark,2)
(hadoop,2)
(hbase,1)
#它会到两个文件中去
走在自己的时区里
在时间上,纽约走在加州前面三个小时,
New York is 3 hours ahead of California,
但加州并没有变慢。
but it does not make California slow.
有人22岁就毕业了,
Someone graduated at the age of 22,
但等了五年才找到好工作!
but waited 5 years before securing a good job!
有人25岁就当上了CEO,
Someone became a CEO at 25,
却在50岁去世了。
and died at 50.
也有人直到50岁才当上CEO,
While another became a CEO at 50,
最后活到90岁。
and lived to 90 years.
有人依然单身,
Someone is still single,
而别人却早已结婚。
while someone else got married.
奥巴马55岁退任总统,
Obama retires at 55,
而川普却是70岁才开始当。
but Trump starts at 70.
世上每个人都有自己的发展时区。
Absolutely everyone in this world works based on their Time Zone.
身边有些人看似走在你前面,
People around you might seem to go ahead of you,
也有人看似走在你后面。
some might seem to be behind you.
但其实每个人在自己的时区有自己的步程。
But everyone is running their own RACE, in their own TIME.
不用嫉妒或嘲笑他们。
Don't envy them or mock them.
他们都在自己的时区,你在你的!
They are in their TIME ZONE, and you are in yours!
所以,别放松。
So, Never Give up
你没有落后,
You're not LATE.
你没有领先。
You're not EARLY.
在命运为你安排的属于你自己的时区里,一切都非常准时。
You are very much ON TIME, and in your TIME ZONE Destiny set up for you.
好,别忘了危机与奋斗
Keep fighting and stay alert, no matter how good
难,别忘了梦想与坚持
Keep dreaming and carry on, no matter how hard;
忙,别忘了读书与锻炼
Keep reading and excersing, no matter how busy;
人生,就是一场长跑
Life is a long run.