Hadoop2.2.0 部署spark 1.0
2014年7月
目 录
介绍...1
1 集群网络环境介绍及快速部署...2
2 SSH无密码验证配置...6
2.1配置所有节点之间SSH无密码验证...6
3 JDK安装和Java环境变量配置...10
3.1 安装 JDK 1.7.10
3.2 Java环境变量配置...10
4 Hadoop集群配置...11
(1)配置Hadoop的配置文件...11
(2)复制配置好的各文件到所有数据节点上。...14
5 Hadoop集群启动...15
6 Hadoop测试...17
7 用YarnClient调用hadoop集群...18
8.配置spark 1.0集群...20
8.1 配置环境变量...20
6.3 将程序分发给每个节点...21
6.4 启动...21
6.5 执行测试程序...22
这是利用Vmware 10.0在一台服务器上搭建的分布式环境,操作系统CentOS 6.4 X64中配置Hadoop-2.2.0时的总结文档。 Hadoop配置建议所有配置文件中使用主机名进行配置,并且机器上应在防火墙中开启相应端口,并设置SSHD服务为开机启动,此外java环境变量可以在/etc/profile中配置。
为了方便使用,这里把需要的程序都打包,放到了云盘上,详见 http://yun.baidu.com/s/1eQeQ7DK
集群包含五个节点:1个namenode,4个datanode,节点之间局域网连接,可以相互ping通。
所有节点均是Centos 6.4 64位系统,防火墙均禁用,sshd服务均开启并设置为开机启动。
a) 首先在VMware中安装好一台Centos 6.4,创建hadoop用户。假设虚拟机的名字为NameNode
b) 关闭虚拟机,把NameNode文件夹,拷贝4份,并命名为DataNode1,..,DataNode4
c) 用VMware打开每个DateNode,设置其虚拟机的名字
d) 打开操作系统,当弹出对话框时,选择“Icopy it”
e) 打开每个虚拟机,查看ip地址
ifconfig
现将IP地址规划如下
192.168.1.150 |
namenode |
192.168.1.151 |
datanode1 |
192.168.1.152 |
datanode2 |
192.168.1.153 |
datanode3 |
192.168.1.154 |
datanode4 |
f) 每个虚拟机,永久关闭防火墙(非常重要,一定要确认),并关闭SELINUX
chkconfig iptables off (永久生效)
service iptables stop (临时有效)
vim /etc/selinux/config
[root@DataNode1 local]#chkconfig iptables off
[root@DataNode1 local]#service iptables stop
iptables: Flushing firewallrules: [ OK ]
iptables: Setting chains topolicy ACCEPT: filter [ OK ]
iptables: Unloadingmodules: [ OK ]
[root@DataNode1 local]#
g) 配置NameNode
第一步,检查机器名
#hostname
如发现不对,则修改,root用户登陆,修改命令如下
# vim /etc/sysconfig/network
|
依次对每个节点进行处理,修改完之后,重启系统#reboot
h) 修改/etc/hosts
root用户
vim /etc/sysconfig/network
(1)namenode节点上编辑/etc/hosts文件
将所有节点的名字和IP地址写入其中,写入如下内容,注意注释掉127.0.0.1行,保证内容如下:(对IP地址一定要确认,是否有重复或者错误)
192.168.1.150 namenode 192.168.1.151 datanode1 192.168.1.152 datanode2 192.168.1.153 datanode3 192.168.1.154 datanode4 # 127.0.0.1 centos63 localhost.localdomain localhost |
(2)将Namenode上的/etc/hosts文件复制到所有数据节点上,操作步骤如下:
root用户登录namenode;
执行命令:
scp /etc/hosts [email protected]:/etc/hosts
scp /etc/hosts [email protected]:/etc/hosts
scp /etc/hosts [email protected]:/etc/hosts
scp /etc/hosts [email protected]:/etc/hosts
i) 规划系统目录
安装目录和数据目录分开,且数据目录和hadoop的用户目录分开,如果需要重新格式化,则可以直接删除所有的数据目录,然后重建数据目录。
如果数据目录和安装目录或者用户目录放置在一起,则对数据目录操作时,存在误删除程序或者用户文件的风险。
完整路径 |
说明 |
/opt/hadoop |
hadoop的程序安装主目录 |
/home/hadoop/hd_space/tmp |
临时目录 |
/home/hadoop/hd_space/hdfs/name |
namenode上存储hdfs名字空间元数据 |
/home/hadoop/hd_space/hdfs/data |
datanode上数据块的物理存储位置 |
/home/hadoop/hd_space/mapred/local |
tasktracker上执行mapreduce程序时的本地目录 |
/home/hadoop/hd_space/mapred/system |
这个是hdfs中的目录,存储执行mr程序时的共享文件 |
开始建立目录:
在NameNode下,root用户
rm -rf /home/hd_space
mkdir -p /home/hadoop/hd_space/tmp
mkdir -p /home/hadoop/hd_space/dfs/name
mkdir -p /home/hadoop/hd_space/dfs/data
mkdir -p /home/hadoop/hd_space/mapred/local
mkdir -p /home/hadoop/hd_space/mapred/system
chown -R hadoop:hadoop /home/hadoop/hd_space/
修改目录/home/hadoop的拥有者(因为该目录用于安装hadoop,用户对其必须有rwx权限。)
chown -R hadoop:hadoop /home/hadoop
创建完毕基础目录后,下一步就是设置SSH无密码验证,以方便hadoop对集群进行管理。
Hadoop需要使用SSH协议,namenode将使用SSH协议启动namenode和datanode进程,datanode向namenode传递心跳信息可能也是使用SSH协议,这是我认为的,还没有做深入了解,datanode之间可能也需要使用SSH协议。假若是,则需要配置使得所有节点之间可以相互SSH无密码登陆验证。
(0)原理
节点A要实现无密码公钥认证连接到节点B上时,节点A是客户端,节点B是服务端,需要在客户端A上生成一个密钥对,包括一个公钥和一个私钥,而后将公钥复制到服务端B上。当客户端A通过ssh连接服务端B时,服务端B就会生成一个随机数并用客户端A的公钥对随机数进行加密,并发送给客户端A。客户端A收到加密数之后再用私钥进行解密,并将解密数回传给B,B确认解密数无误之后就允许A进行连接了。这就是一个公钥认证过程,其间不需要用户手工输入密码。重要过程是将客户端A公钥复制到B上。
因此如果要实现所有节点之间无密码公钥认证,则需要将所有节点的公钥都复制到所有节点上。
(1)所有机器上生成密码对
(a)所有节点用hadoop用户登陆,并执行以下命令,生成rsa密钥对:
ssh-keygen -t rsa
这将在/home/hd_space/.ssh/目录下生成一个私钥id_rsa和一个公钥id_rsa.pub。
# su hadoop
ssh-keygen -trsa
Generatingpublic/private rsa key pair.
Enter file in whichto save the key (/home/ hadoop /.ssh/id_rsa): 默认路径
Enter passphrase(empty for no passphrase): 回车,空密码
Enter samepassphrase again:
Your identificationhas been saved in /home/ hadoop /.ssh/id_rsa.
Your public key hasbeen saved in /home/ hadoop /.ssh/id_rsa.pub.
这将在/home/hd_space/.ssh/目录下生成一个私钥id_rsa和一个公钥id_rsa.pub。
(b)将所有datanode节点的公钥id_rsa.pub传送到namenode上:
DataNode1上执行命令:
scp id_rsa.pub hadoop@NameNode:/home/hadoop/.ssh/ id_rsa.pub.datanode1
......
DataNodeN上执行命令:
scp id_rsa.pub hadoop@NameNode:/home/hadoop/.ssh/ id_rsa.pub.datanoden
检查一下是否都已传输过来
各个数据节点的公钥已经传输过来。
(c)namenode节点上综合所有公钥(包括自身)并传送到所有节点上
[[email protected]]$ cat id_rsa.pub >> authorized_keys 这是namenode自己的公钥
[[email protected]]$ cat id_rsa.pub.datanode1 >> authorized_keys
[[email protected]]$ cat id_rsa.pub.datanode2 >> authorized_keys
[[email protected]]$ cat id_rsa.pub.datanode3 >> authorized_keys
[[email protected]]$ cat id_rsa.pub.datanode4 >> authorized_keys
chmod644 ~/.ssh/authorized_keys
使用SSH协议将namenode的公钥信息authorized_keys复制到所有DataNode的.ssh目录下。
scpauthorized_keys data节点ip地址:/home/hd_space/.ssh
scp~/.ssh/authorized_keys hadoop@DataNode1:/home/hadoop/.ssh/authorized_keys
scp~/.ssh/authorized_keys hadoop@DataNode2:/home/hadoop/.ssh/authorized_keys
scp~/.ssh/authorized_keys hadoop@DataNode3:/home/hadoop/.ssh/authorized_keys
scp~/.ssh/authorized_keys hadoop@DataNode4:/home/hadoop/.ssh/authorized_keys
从这里就可以看到,当配置好hosts之后,就可以直接以机器名来访问各个机器,不用再记忆各个机器的具体IP地址,当集群中机器很多且IP不连续时,就发挥出威力来了。
从上图可以看到,将authorized_keys分发给各个节点之后,可以直接ssh登录,不再需要密码。
这样配置过后,namenode可以无密码登录所有datanode,可以通过命令
“ssh DataNode1(2,3,4)”来验证。
配置完毕,在namenode上执行“ssh NameNode,所有数据节点”命令,因为ssh执行一次之后将不会再询问。在各个DataNode上也进行“ssh NameNode,所有数据节点”命令。
至此,所有的节点都能相互访问,下一步开始配置jdk
1.下载JDK。
选定linux环境版本,下载到的文件是:jdk-7u21-linux-x64.tar.gz
2.解压
mv jdk-7u21-linux-x64.tar.gz
tarxf jdk-7u21-linux-x64.tar.gz
root用户登陆,命令行中执行命令”vim /etc/profile”,并加入以下内容,配置环境变量(注意/etc/profile这个文件很重要,后面Hadoop的配置还会用到)。
#setjava environment
exportJAVA_HOME=/opt/jdk1.7.0_21
exportJRE_HOME=/opt/jdk1.7.0_21/jre
exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
保存并退出,执行以下命令使配置生效
chmod +x /etc/profile
source /etc/profile
配置完毕,在命令行中使用命令“java - version”可以判断是否成功。在hadoop用户下测试java –version,一样成功。
在namenode上执行:
Hadoop用户登录。
下载hadoop-2.2.0(已编译好的64位的hadoop 2.2,可以从网盘下载
http://pan.baidu.com/s/1sjz2ORN),将其解压到/opt目录下.
(a)配置/etc/profile
#set hadoop
export HADOOP_HOME=/opt/hadoop-2.2.0
exportHADOOP_CONF_DIR=/opt/hadoop-2.2.0/etc/hadoop
export YARN_CONF_DIR=/opt/hadoop-2.2.0/etc/hadoop
exportPATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
$ vim $HADOOP_CONF_DIR/hadoop-env.sh
exportJAVA_HOME=/opt/jdk1.7.0_21
vim $HADOOP_CONF_DIR/ yarn-env.sh
export JAVA_HOME=/opt/jdk1.7.0_21
vim $HADOOP_CONF_DIR/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://NameNode:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hd_space/tmp</value>
</property>
<?xmlversion="1.0" encoding="UTF-8"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version2.0 (the "License");
you may not use this file except incompliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreedto in writing, software
distributed under the License is distributedon an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied.
See the License for the specific languagegoverning permissions and
limitations under the License. Seeaccompanying LICENSE file.
-->
<!--Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hd_space/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hd_space/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version2.0 (the "License");
you may not use this file except incompliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreedto in writing, software
distributed under the License is distributedon an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied.
See the License for the specific languagegoverning permissions and
limitations under the License. Seeaccompanying LICENSE file.
-->
<!-- Put site-specificproperty overrides in this file. -->
<configuration>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/home/hadoop/hd_space/mapred/local</value>
</property>
<property>
<name>mapreduce.cluster.system.dir</name>
<value>/home/hadoop/hd_space/mapred/system</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>NameNode:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>NameNode:19888</value>
</property>
</configuration>
(g)配置masters文件,把localhost修改为namenode的主机名
NameNode |
(h)配置slaves文件, 删除localhost,加入所有datanode的主机名
DataNode1 DataNode2 DataNode3 DataNode4 |
在NameNode,执行脚本命令
for target inDataNode1 DataNode2 DataNode3 DataNode4
do
scp -r/opt/hadoop-2.2.0/etc/hadoop $target:/opt/hadoop-2.2.0/etc
done
hadoop namenode -format
--------------------因为配置了环境变量,此处不需要输入hadoop命令的全路径/hadoop/bin/hadoop
执行后的结果中会提示“ dfs/namehas been successfully formatted”。否则格式化失败。
启动hadoop:
start-dfs.sh
start-yarn.sh
启动成功后,分别在namenode和datanode所在机器上使用jps 命令查看,会在namenode所在机器上看到namenode,secondaryNamenode,ResourceManager
[hadoop@NameNode hadoop]$ jps
9097 Jps
8662 SecondaryNameNode
8836 ResourceManager
8459 NameNode
[hadoop@NameNode hadoop]$
会在datanode1所在机器上看到datanode,tasktracker.否则启动失败,检查配置是否有问题。
[root@DataNode1 .ssh]# jps
4885 Jps
4623 DataNode
4736 NodeManager
[root@DataNode1 .ssh]#
datanode1所在机器上看到datanode,NodeManager.
查看集群状态:
hdfs dfsadmin –report
停止hadoop:
./sbin/stop-dfs.sh
./sbin/stop-yarn.sh
查看HDFS: http://192.168.1.150:50070/dfshealth.jsp
查看RM:
[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs-mkdir /tmp
[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -ls /
14/07/08 15:31:22 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable
Found 1 items
drwx------ - hadoop supergroup 02014-07-08 15:29 /tmp
[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs-copyFromLocal /opt/hadoop-2.2.0/test.txt hdfs://namenode:9000/tmp/test.txt
[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -ls/tmp
14/07/08 15:34:11 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable
Found 2 items
drwx------ - hadoop supergroup 02014-07-08 15:29 /tmp/hadoop-yarn
-rw-r--r-- 3 hadoop supergroup 20442014-07-08 15:34 /tmp/test.txt
执行命令
[hadoop@NameNode hadoop-2.2.0]$
hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount/tmp/test.txt /tmp-output
[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -ls/tmp-output
14/07/08 16:07:21 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable
Found 2 items
-rw-r--r-- 3 hadoop supergroup 02014-07-08 15:35 /tmp-output/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 10452014-07-08 15:35 /tmp-output/part-r-00000
[hadoop@NameNode hadoop-2.2.0]$
查看执行结果
[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -cat /tmp-output/part-r-00000
BAD_ID=0 1
Bytes 2
CONNECTION=0 1
CPU 1
Combine 2
hdfs dfs -mkdir /jar
hdfs dfs -mkdir /jar/spark
hdfs dfs -copyFromLocal/opt/spark-1.0.0-bin-2.2.0/lib/spark-assembly-1.0.0-hadoop2.2.0.jar hdfs://namenode:9000/jar/spark/spark-assembly-1.0.0-hadoop2.2.0.jar
只需要把解压包copy到yarn集群中的任意一台。一个节点就够了,不需要在所有节点都部署,除非你需要多个Client节点调用spark作业。
在这里我们不需要搭建独立的Spark集群,利用Yarn Client调用Hadoop集群的计算资源。
mv 解压后的目录/conf/spark-env.sh.template 解压后的目录/conf/spark-env.sh
编辑spark-env.sh
export HADOOP_HOME=/opt/hadoop-2.2.0
exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_EXECUTOR_INSTANCES=4
SPARK_EXECUTOR_CORES=1
SPARK_EXECUTOR_MEMORY=1G
SPARK_DRIVER_MEMORY=2G
SPARK_YARN_APP_NAME="Spark 1.0.0"
这是我的配置,配置和之前的几个版本略有不同,但大差不差。
用Yarn Client调用一下MR中的经典例子:Spark版的word count
这里要特别注意,SparkContext有变动,之前版本wordcount例子中的的第一个参数要去掉。
为了方便,我把SPARK_HOME/lib/spark-assembly-1.0.0-hadoop2.2.0.jar 拷贝到了HDFS中进行调用。(直接调用本地磁盘也是可以的)
SPARK_JAR="hdfs://NameNode:9000/jar/spark/spark-assembly-1.0.0-hadoop2.2.0.jar"\
./bin/spark-class org.apache.spark.deploy.yarn.Client\
--jar./lib/spark-examples-1.0.0-hadoop2.2.0.jar \
--classorg.apache.spark.examples.JavaWordCount \
--arg hdfs://NameNode:9000/tmp/test.txt \
--num-executors 50 \
--executor-cores 1 \
--driver-memory 2048M \
--executor-memory 1000M \
--name "word count on spark"
运行结果在stdout中查看
速度还行吧,用4台节点/64个core计算5.1GB文件,用时221秒。
添加计算节点
vi /opt/spark-1.0.0-bin-2.2.0/conf/slaves
DataNode1
DataNode2
DataNode3
DataNode4
修改spark-env.sh
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
添加如下信息
export SCALA_HOME=/opt/scala-2.10.3
export JAVA_HOME=/opt/jdk1.7.0_55
export SPARK_MASTER_IP=192.168.1.150
export SPARK_WORKER_MEMORY=10G
#设置JVM的内存设置
# Set SPARK_MEM if it isn't already set sincewe also use it for this process
SPARK_MEM=${SPARK_MEM:-10g}
export SPARK_MEM
# Set JAVA_OPTS to be able to load nativelibraries and to set heap size
JAVA_OPTS="$OUR_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS-Xms$SPARK_MEM -Xmx$SPARK_MEM"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
SPARK_WORKER_MEMORY 是Spark在每一个节点上可用内存的最大,增加这个数值可以在内存中缓存更多的数据,但是一定要记住给Slave的操作系统和其他服务预留足够的内存。
http://stackoverflow.com/questions/21138751/spark-java-lang-outofmemoryerror-java-heap-space
下面是从stackoverflow上参考的信息
Havea look at thestart up scripts aJava heap size is set there, it looks like you're not setting this beforerunning Spark worker.
# Set SPARK_MEM if it isn't already set since we also use it for this process
SPARK_MEM=${SPARK_MEM:-512m}
export SPARK_MEM
# Set JAVA_OPTS to be able to load native libraries and to set heap size
JAVA_OPTS="$OUR_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"
for target in DataNode1 DataNode2 DataNode3DataNode4
do
scp-r /opt/spark-1.0.0-bin-2.2.0 $target:/opt
done
cd /opt/spark-1.0.0-bin-2.2.0/sbin
./start-all.sh
[hadoop@NameNode sbin]$ ./start-all.sh
startingorg.apache.spark.deploy.master.Master, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-NameNode.out
DataNode2: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode2.out
DataNode3: starting org.apache.spark.deploy.worker.Worker,logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode3.out
DataNode1: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode1.out
DataNode4: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode4.out
[hadoop@NameNode sbin]$
在浏览器查看
[[email protected]]$ bin/spark-shell--executor-memory 2g --driver-memory 1g --master spark://NameNode:7077
14/07/08 19:18:09INFO spark.SecurityManager: Changing view acls to: hadoop
14/07/08 19:18:09INFO spark.SecurityManager: SecurityManager: authentication disabled; ui aclsdisabled; users with view permissions: Set(hadoop)
14/07/08 19:18:09INFO spark.HttpServer: Starting HTTP Server
14/07/08 19:18:09INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/08 19:18:09INFO server.AbstractConnector: Started [email protected]:57198
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.0.0
/_/
Using Scala version2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressionsto have them evaluated.
Type :help for moreinformation.
14/07/08 19:18:13INFO spark.SecurityManager: Changing view acls to: hadoop
14/07/08 19:18:13INFO spark.SecurityManager: SecurityManager: authentication disabled; ui aclsdisabled; users with view permissions: Set(hadoop)
14/07/08 19:18:13INFO slf4j.Slf4jLogger: Slf4jLogger started
14/07/08 19:18:13INFO Remoting: Starting remoting
14/07/08 19:18:14INFO Remoting: Remoting started; listening on addresses:[akka.tcp://spark@NameNode:51486]
14/07/08 19:18:14INFO Remoting: Remoting now listens on addresses:[akka.tcp://spark@NameNode:51486]
14/07/08 19:18:14INFO spark.SparkEnv: Registering MapOutputTracker
14/07/08 19:18:14INFO spark.SparkEnv: Registering BlockManagerMaster
14/07/08 19:18:14INFO storage.DiskBlockManager: Created local directory at/tmp/spark-local-20140708191814-fe19
14/07/08 19:18:14INFO storage.MemoryStore: MemoryStore started with capacity 5.8 GB.
14/07/08 19:18:14INFO network.ConnectionManager: Bound socket to port 47219 with id =ConnectionManagerId(NameNode,47219)
14/07/08 19:18:14INFO storage.BlockManagerMaster: Trying to register BlockManager
14/07/08 19:18:14INFO storage.BlockManagerInfo: Registering block manager NameNode:47219 with5.8 GB RAM
14/07/08 19:18:14INFO storage.BlockManagerMaster: Registered BlockManager
14/07/08 19:18:14INFO spark.HttpServer: Starting HTTP Server
14/07/08 19:18:14INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/08 19:18:14INFO server.AbstractConnector: Started [email protected]:35560
14/07/08 19:18:14INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.1.150:35560
14/07/08 19:18:14INFO spark.HttpFileServer: HTTP File server directory is/tmp/spark-201155bc-731d-4eea-b637-88982e32ee14
14/07/08 19:18:14INFO spark.HttpServer: Starting HTTP Server
14/07/08 19:18:14INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/08 19:18:14INFO server.AbstractConnector: Started [email protected]:53311
14/07/08 19:18:14INFO server.Server: jetty-8.y.z-SNAPSHOT
14/07/08 19:18:14INFO server.AbstractConnector: Started [email protected]:4040
14/07/08 19:18:14INFO ui.SparkUI: Started SparkUI at http://NameNode:4040
14/07/08 19:18:15 WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable
14/07/08 19:18:15INFO client.AppClient$ClientActor: Connecting to masterspark://NameNode:7077...
14/07/08 19:18:15INFO repl.SparkILoop: Created spark context..
14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with appID app-20140708191815-0001
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/0 onworker-20140708190701-DataNode4-48388 (DataNode4:48388) with 16 cores
14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/0 on hostPort DataNode4:48388 with 16 cores, 2.0 GB RAM
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/1 onworker-20140708190659-DataNode3-44272 (DataNode3:44272) with 16 cores
14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/1 on hostPort DataNode3:44272 with 16 cores, 2.0 GB RAM
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/2 onworker-20140708190700-DataNode2-57378 (DataNode2:57378) with 16 cores
14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/2 on hostPort DataNode2:57378 with 16 cores, 2.0 GB RAM
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/3 onworker-20140708190700-DataNode1-55222 (DataNode1:55222) with 16 cores
14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/3 on hostPort DataNode1:55222 with 16 cores, 2.0 GB RAM
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/3is now RUNNING
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/2is now RUNNING
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/0is now RUNNING
14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/1is now RUNNING
Spark contextavailable as sc.
scala> 14/07/0819:18:18 INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode4:40761/user/Executor#807513222] with ID0
14/07/08 19:18:18INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode1:57590/user/Executor#-2071278347] withID 3
14/07/08 19:18:18INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode2:43335/user/Executor#-723681055] withID 2
14/07/08 19:18:18INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode3:43008/user/Executor#-1215215976] withID 1
14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode4:44391 with1177.6 MB RAM
14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode1:40306 with1177.6 MB RAM
14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode2:35755 with1177.6 MB RAM
14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode3:42366 with1177.6 MB RAM
scala> valrdd=sc.textFile("hdfs://NameNode:9000/tmp/test.txt")
14/07/08 19:18:39INFO storage.MemoryStore: ensureFreeSpace(141503) called with curMem=0,maxMem=6174041702
14/07/08 19:18:39INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimatedsize 138.2 KB, free 5.7 GB)
rdd:org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at<console>:12
scala> rdd.cache()
res0: rdd.type =MappedRDD[1] at textFile at <console>:12
scala> valwordcount=rdd.flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_)
14/07/08 19:19:04INFO mapred.FileInputFormat: Total input paths to process : 1
wordcount:org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[6] at reduceByKey at<console>:14
scala> wordcount.take(10)
14/07/08 19:19:11INFO spark.SparkContext: Starting job: take at <console>:17
14/07/08 19:19:11INFO scheduler.DAGScheduler: Registering RDD 4 (reduceByKey at<console>:14)
14/07/08 19:19:11INFO scheduler.DAGScheduler: Got job 0 (take at <console>:17) with 1output partitions (allowLocal=true)
14/07/08 19:19:11INFO scheduler.DAGScheduler: Final stage: Stage 0(take at <console>:17)
14/07/08 19:19:11INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)
14/07/08 19:19:11INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)
14/07/08 19:19:11INFO scheduler.DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[4] atreduceByKey at <console>:14), which has no missing parents
14/07/08 19:19:11INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[4]at reduceByKey at <console>:14)
14/07/08 19:19:11INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/07/08 19:19:11INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 0 on executor 2:DataNode2 (NODE_LOCAL)
14/07/08 19:19:11INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 2079 bytes in 6 ms
14/07/08 19:19:11INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID 1 on executor 1:DataNode3 (NODE_LOCAL)
14/07/08 19:19:11INFO scheduler.TaskSetManager: Serialized task 1.0:1 as 2079 bytes in 1 ms
14/07/08 19:19:12INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on DataNode3:42366(size: 3.2 KB, free: 1177.6 MB)
14/07/08 19:19:12INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on DataNode2:35755 (size:3.1 KB, free: 1177.6 MB)
14/07/08 19:19:13INFO scheduler.TaskSetManager: Finished TID 0 in 1830 ms on DataNode2(progress: 1/2)
14/07/08 19:19:13INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 0)
14/07/08 19:19:13INFO scheduler.TaskSetManager: Finished TID 1 in 1821 ms on DataNode3(progress: 2/2)
14/07/08 19:19:13INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have allcompleted, from pool
14/07/08 19:19:13INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 1)
14/07/08 19:19:13INFO scheduler.DAGScheduler: Stage 1 (reduceByKey at <console>:14)finished in 1.853 s
14/07/08 19:19:13INFO scheduler.DAGScheduler: looking for newly runnable stages
14/07/08 19:19:13INFO scheduler.DAGScheduler: running: Set()
14/07/08 19:19:13INFO scheduler.DAGScheduler: waiting: Set(Stage 0)
14/07/08 19:19:13INFO scheduler.DAGScheduler: failed: Set()
14/07/08 19:19:13INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()
14/07/08 19:19:13INFO scheduler.DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[6] atreduceByKey at <console>:14), which is now runnable
14/07/08 19:19:13INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0(MapPartitionsRDD[6] at reduceByKey at <console>:14)
14/07/08 19:19:13INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/07/08 19:19:13INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 2 on executor 2:DataNode2 (PROCESS_LOCAL)
14/07/08 19:19:13INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1972 bytes in 1 ms
14/07/08 19:19:13INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode2:36057
14/07/08 19:19:13INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 146bytes
14/07/08 19:19:13INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)
14/07/08 19:19:13INFO scheduler.TaskSetManager: Finished TID 2 in 404 ms on DataNode2 (progress:1/1)
14/07/08 19:19:13INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have allcompleted, from pool
14/07/08 19:19:13INFO scheduler.DAGScheduler: Stage 0 (take at <console>:17) finished in0.407 s
14/07/08 19:19:13INFO spark.SparkContext: Job finished: take at <console>:17, took2.437269965 s
res1: Array[(String,Int)] = Array((BAD_ID=0,1), (committed,1), (Written=196192,1), (tasks=1,3),(Framework,1), (outputs=1,1), (groups=18040,1), (map,2), (Reduce,4), (ystem,1))
scala> valwordsort=wordcount.map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1))
14/07/08 19:19:23 INFOspark.SparkContext: Starting job: sortByKey at <console>:16
14/07/08 19:19:23INFO scheduler.DAGScheduler: Got job 1 (sortByKey at <console>:16) with 2output partitions (allowLocal=false)
14/07/08 19:19:23INFO scheduler.DAGScheduler: Final stage: Stage 2(sortByKey at<console>:16)
14/07/08 19:19:23INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 3)
14/07/08 19:19:23INFO scheduler.DAGScheduler: Missing parents: List()
14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting Stage 2 (MappedRDD[7] at map at<console>:16), which has no missing parents
14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 2(MappedRDD[7] at map at <console>:16)
14/07/08 19:19:23INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
14/07/08 19:19:23INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 3 on executor 2:DataNode2 (PROCESS_LOCAL)
14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 1970 bytes in 0 ms
14/07/08 19:19:23INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 4 on executor 1:DataNode3 (PROCESS_LOCAL)
14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 1970 bytes in 0 ms
14/07/08 19:19:23INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode3:59586
14/07/08 19:19:23INFO scheduler.DAGScheduler: Completed ResultTask(2, 0)
14/07/08 19:19:23INFO scheduler.TaskSetManager: Finished TID 3 in 117 ms on DataNode2 (progress:1/2)
14/07/08 19:19:23INFO scheduler.DAGScheduler: Completed ResultTask(2, 1)
14/07/08 19:19:23INFO scheduler.TaskSetManager: Finished TID 4 in 168 ms on DataNode3 (progress:2/2)
14/07/08 19:19:23INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have allcompleted, from pool
14/07/08 19:19:23INFO scheduler.DAGScheduler: Stage 2 (sortByKey at <console>:16) finishedin 0.172 s
14/07/08 19:19:23INFO spark.SparkContext: Job finished: sortByKey at <console>:16, took0.19438825 s
14/07/08 19:19:23INFO spark.SparkContext: Starting job: sortByKey at <console>:16
14/07/08 19:19:23INFO scheduler.DAGScheduler: Got job 2 (sortByKey at <console>:16) with 2output partitions (allowLocal=false)
14/07/08 19:19:23INFO scheduler.DAGScheduler: Final stage: Stage 4(sortByKey at<console>:16)
14/07/08 19:19:23INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 5)
14/07/08 19:19:23INFO scheduler.DAGScheduler: Missing parents: List()
14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting Stage 4 (MappedRDD[9] at sortByKey at<console>:16), which has no missing parents
14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 4(MappedRDD[9] at sortByKey at <console>:16)
14/07/08 19:19:23INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
14/07/08 19:19:23INFO scheduler.TaskSetManager: Starting task 4.0:0 as TID 5 on executor 2:DataNode2 (PROCESS_LOCAL)
14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 2454 bytes in 0 ms
14/07/08 19:19:23 INFOscheduler.TaskSetManager: Starting task 4.0:1 as TID 6 on executor 0: DataNode4(PROCESS_LOCAL)
14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 4.0:1 as 2454 bytes in 0 ms
14/07/08 19:19:24INFO scheduler.DAGScheduler: Completed ResultTask(4, 0)
14/07/08 19:19:24INFO scheduler.TaskSetManager: Finished TID 5 in 104 ms on DataNode2 (progress:1/2)
14/07/08 19:19:24INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode4:45983
14/07/08 19:19:24INFO scheduler.DAGScheduler: Completed ResultTask(4, 1)
14/07/08 19:19:24INFO scheduler.TaskSetManager: Finished TID 6 in 908 ms on DataNode4 (progress:2/2)
14/07/08 19:19:24INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have allcompleted, from pool
14/07/08 19:19:24INFO scheduler.DAGScheduler: Stage 4 (sortByKey at <console>:16) finishedin 0.912 s
14/07/08 19:19:24INFO spark.SparkContext: Job finished: sortByKey at <console>:16, took0.947661867 s
wordsort:org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[12] at map at<console>:16
scala> wordsort.take(10)
14/07/08 19:19:31INFO spark.SparkContext: Starting job: take at <console>:19
14/07/08 19:19:31INFO scheduler.DAGScheduler: Registering RDD 7 (map at <console>:16)
14/07/08 19:19:31INFO scheduler.DAGScheduler: Got job 3 (take at <console>:19) with 1output partitions (allowLocal=true)
14/07/08 19:19:31INFO scheduler.DAGScheduler: Final stage: Stage 6(take at <console>:19)
14/07/08 19:19:31INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 7)
14/07/08 19:19:31INFO scheduler.DAGScheduler: Missing parents: List(Stage 7)
14/07/08 19:19:31INFO scheduler.DAGScheduler: Submitting Stage 7 (MappedRDD[7] at map at<console>:16), which has no missing parents
14/07/08 19:19:31INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 7(MappedRDD[7] at map at <console>:16)
14/07/08 19:19:31INFO scheduler.TaskSchedulerImpl: Adding task set 7.0 with 2 tasks
14/07/08 19:19:31INFO scheduler.TaskSetManager: Starting task 7.0:0 as TID 7 on executor 0:DataNode4 (PROCESS_LOCAL)
14/07/08 19:19:31INFO scheduler.TaskSetManager: Serialized task 7.0:0 as 2102 bytes in 1 ms
14/07/08 19:19:31INFO scheduler.TaskSetManager: Starting task 7.0:1 as TID 8 on executor 3:DataNode1 (PROCESS_LOCAL)
14/07/08 19:19:31INFO scheduler.TaskSetManager: Serialized task 7.0:1 as 2102 bytes in 0 ms
14/07/08 19:19:32INFO scheduler.TaskSetManager: Finished TID 7 in 93 ms on DataNode4 (progress:1/2)
14/07/08 19:19:32INFO scheduler.DAGScheduler: Completed ShuffleMapTask(7, 0)
14/07/08 19:19:32INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode1:46772
14/07/08 19:19:32INFO scheduler.TaskSetManager: Finished TID 8 in 820 ms on DataNode1 (progress:2/2)
14/07/08 19:19:32INFO scheduler.DAGScheduler: Completed ShuffleMapTask(7, 1)
14/07/08 19:19:32INFO scheduler.TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have allcompleted, from pool
14/07/08 19:19:32INFO scheduler.DAGScheduler: Stage 7 (map at <console>:16) finished in0.822 s
14/07/08 19:19:32INFO scheduler.DAGScheduler: looking for newly runnable stages
14/07/08 19:19:32INFO scheduler.DAGScheduler: running: Set()
14/07/08 19:19:32INFO scheduler.DAGScheduler: waiting: Set(Stage 6)
14/07/08 19:19:32INFO scheduler.DAGScheduler: failed: Set()
14/07/08 19:19:32INFO scheduler.DAGScheduler: Missing parents for Stage 6: List()
14/07/08 19:19:32INFO scheduler.DAGScheduler: Submitting Stage 6 (MappedRDD[12] at map at<console>:16), which is now runnable
14/07/08 19:19:32INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 6(MappedRDD[12] at map at <console>:16)
14/07/08 19:19:32INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 1 tasks
14/07/08 19:19:32INFO scheduler.TaskSetManager: Starting task 6.0:0 as TID 9 on executor 2:DataNode2 (PROCESS_LOCAL)
14/07/08 19:19:32INFO scheduler.TaskSetManager: Serialized task 6.0:0 as 2381 bytes in 0 ms
14/07/08 19:19:32INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 1 to spark@DataNode2:36057
14/07/08 19:19:32INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 149bytes
14/07/08 19:19:32INFO scheduler.DAGScheduler: Completed ResultTask(6, 0)
14/07/08 19:19:32INFO scheduler.TaskSetManager: Finished TID 9 in 119 ms on DataNode2 (progress:1/1)
14/07/08 19:19:32INFO scheduler.TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have allcompleted, from pool
14/07/08 19:19:32INFO scheduler.DAGScheduler: Stage 6 (take at <console>:19) finished in0.122 s
14/07/08 19:19:32INFO spark.SparkContext: Job finished: take at <console>:19, took0.978011069 s
res2:Array[(String, Int)] = Array(("",724), (Number,10), (of,10), (Map,5),(FILE:,5), (HDFS:,5), (output,5), (Reduce,4), (input,4), (time,4))
scala>
bin/spark-submit--master spark://NameNode:7077 --class org.apache.spark.examples.SparkPi--executor-memory 2g lib/spark-examples-1.0.0-hadoop2.2.0.jar 1000
部分执行结果
4/07/08 19:37:12 INFO scheduler.TaskSetManager:Finished TID 994 in 610 ms on DataNode3 (progress: 998/1000)
14/07/08 19:37:12 INFOscheduler.DAGScheduler: Completed ResultTask(0, 994)
14/07/08 19:37:12 INFOscheduler.TaskSetManager: Finished TID 997 in 620 ms on DataNode3 (progress:999/1000)
14/07/08 19:37:12 INFOscheduler.DAGScheduler: Completed ResultTask(0, 997)
14/07/08 19:37:12 INFOscheduler.TaskSetManager: Finished TID 993 in 625 ms on DataNode3 (progress:1000/1000)
14/07/08 19:37:12 INFOscheduler.DAGScheduler: Completed ResultTask(0, 993)
14/07/08 19:37:12 INFOscheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 25.020s
14/07/08 19:37:12 INFOscheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have allcompleted, from pool
14/07/08 19:37:12 INFO spark.SparkContext:Job finished: reduce at SparkPi.scala:35, took 25.502195433 s
Pi is roughly 3.14185688
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/metrics/json,null}
14/07/08 19:37:12 INFO handler.ContextHandler:stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/executors/json,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
14/07/08 19:37:12 INFO handler.ContextHandler:stopped o.e.j.s.ServletContextHandler{/environment/json,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/environment,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/storage/rdd,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/storage/json,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/stages/pool/json,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/stages/stage/json,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/stages/json,null}
14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
14/07/08 19:37:12 INFO ui.SparkUI: StoppedSpark web UI at http://NameNode:4040
14/07/08 19:37:12 INFOscheduler.DAGScheduler: Stopping DAGScheduler
14/07/08 19:37:12 INFOcluster.SparkDeploySchedulerBackend: Shutting down all executors
14/07/08 19:37:12 INFOcluster.SparkDeploySchedulerBackend: Asking each executor to shut down
14/07/08 19:37:13 INFOspark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/07/08 19:37:13 INFOnetwork.ConnectionManager: Selector thread was interrupted!
14/07/08 19:37:13 INFOnetwork.ConnectionManager: ConnectionManager stopped
14/07/08 19:37:13 INFO storage.MemoryStore:MemoryStore cleared
14/07/08 19:37:13 INFO storage.BlockManager:BlockManager stopped
14/07/08 19:37:13 INFOstorage.BlockManagerMasterActor: Stopping BlockManagerMaster
14/07/08 19:37:13 INFOstorage.BlockManagerMaster: BlockManagerMaster stopped
14/07/08 19:37:13 INFO spark.SparkContext:Successfully stopped SparkContext
14/07/08 19:37:13 INFOremote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/07/08 19:37:13 INFOremote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down;proceeding with flushing remote transports.