Hadoop2.2.0 部署spark 1.0

Hadoop2.2.0 部署spark 1.0

  2014年7月


目   录

介绍...1

1 集群网络环境介绍及快速部署...2

2 SSH无密码验证配置...6

2.1配置所有节点之间SSH无密码验证...6

3 JDK安装和Java环境变量配置...10

3.1 安装 JDK 1.7.10

3.2 Java环境变量配置...10

4 Hadoop集群配置...11

(1)配置Hadoop的配置文件...11

(2)复制配置好的各文件到所有数据节点上。...14

5 Hadoop集群启动...15

6 Hadoop测试...17

7 用YarnClient调用hadoop集群...18

8.配置spark 1.0集群...20

8.1 配置环境变量...20

6.3 将程序分发给每个节点...21

6.4 启动...21

6.5 执行测试程序...22

 

介绍

       这是利用Vmware 10.0在一台服务器上搭建的分布式环境,操作系统CentOS 6.4 X64中配置Hadoop-2.2.0时的总结文档。       Hadoop配置建议所有配置文件中使用主机名进行配置,并且机器上应在防火墙中开启相应端口,并设置SSHD服务为开机启动,此外java环境变量可以在/etc/profile中配置。

  

   为了方便使用,这里把需要的程序都打包,放到了云盘上,详见 http://yun.baidu.com/s/1eQeQ7DK


1 集群网络环境介绍及快速部署

    集群包含五个节点:1个namenode,4个datanode,节点之间局域网连接,可以相互ping通。

 

所有节点均是Centos 6.4 64位系统,防火墙均禁用,sshd服务均开启并设置为开机启动

a)  首先在VMware中安装好一台Centos 6.4,创建hadoop用户。假设虚拟机的名字为NameNode

b)  关闭虚拟机,把NameNode文件夹,拷贝4份,并命名为DataNode1,..,DataNode4

c)  用VMware打开每个DateNode,设置其虚拟机的名字

d)  打开操作系统,当弹出对话框时,选择“Icopy it”

e)  打开每个虚拟机,查看ip地址

ifconfig

现将IP地址规划如下

192.168.1.150

namenode

192.168.1.151

datanode1

192.168.1.152

datanode2

192.168.1.153

datanode3

192.168.1.154

datanode4

 

 

 

 

 

 

 

f) 每个虚拟机,永久关闭防火墙(非常重要,一定要确认),并关闭SELINUX

chkconfig iptables off (永久生效)

service iptables stop (临时有效)

 

 

vim  /etc/selinux/config

 

 

[root@DataNode1 local]#chkconfig iptables off

[root@DataNode1 local]#service iptables stop

iptables: Flushing firewallrules:                         [  OK  ]

iptables: Setting chains topolicy ACCEPT: filter          [  OK  ]

iptables: Unloadingmodules:                              [  OK  ]

[root@DataNode1 local]#

g)  配置NameNode

第一步,检查机器名

#hostname

如发现不对,则修改,root用户登陆,修改命令如下

# vim /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=NameNode

 

 

 

 

 


依次对每个节点进行处理,修改完之后,重启系统#reboot

 

h)  修改/etc/hosts

root用户

vim /etc/sysconfig/network

(1)namenode节点上编辑/etc/hosts文件

将所有节点的名字和IP地址写入其中,写入如下内容,注意注释掉127.0.0.1行,保证内容如下:(对IP地址一定要确认,是否有重复或者错误

192.168.1.150 namenode

192.168.1.151 datanode1

192.168.1.152 datanode2

192.168.1.153 datanode3

192.168.1.154 datanode4

# 127.0.0.1     centos63 localhost.localdomain localhost

 

(2)将Namenode上的/etc/hosts文件复制到所有数据节点上,操作步骤如下:

    root用户登录namenode;

执行命令:

 

scp /etc/hosts [email protected]:/etc/hosts

scp /etc/hosts [email protected]:/etc/hosts

scp /etc/hosts [email protected]:/etc/hosts

scp /etc/hosts [email protected]:/etc/hosts

 

i)  规划系统目录

安装目录和数据目录分开,且数据目录和hadoop的用户目录分开,如果需要重新格式化,则可以直接删除所有的数据目录,然后重建数据目录。

如果数据目录和安装目录或者用户目录放置在一起,则对数据目录操作时,存在误删除程序或者用户文件的风险。

完整路径

说明

/opt/hadoop

hadoop的程序安装主目录

/home/hadoop/hd_space/tmp

临时目录

/home/hadoop/hd_space/hdfs/name

namenode上存储hdfs名字空间元数据

/home/hadoop/hd_space/hdfs/data

datanode上数据块的物理存储位置

/home/hadoop/hd_space/mapred/local

tasktracker上执行mapreduce程序时的本地目录

/home/hadoop/hd_space/mapred/system

这个是hdfs中的目录,存储执行mr程序时的共享文件

 

开始建立目录:

在NameNode下,root用户

rm -rf /home/hd_space

mkdir -p /home/hadoop/hd_space/tmp

mkdir -p /home/hadoop/hd_space/dfs/name

mkdir -p /home/hadoop/hd_space/dfs/data

mkdir -p /home/hadoop/hd_space/mapred/local

mkdir -p /home/hadoop/hd_space/mapred/system

chown -R hadoop:hadoop /home/hadoop/hd_space/

 

修改目录/home/hadoop的拥有者(因为该目录用于安装hadoop,用户对其必须有rwx权限。)

 

chown -R hadoop:hadoop /home/hadoop

 

 

创建完毕基础目录后,下一步就是设置SSH无密码验证,以方便hadoop对集群进行管理。

2 SSH无密码验证配置

    Hadoop需要使用SSH协议,namenode将使用SSH协议启动namenode和datanode进程,datanode向namenode传递心跳信息可能也是使用SSH协议,这是我认为的,还没有做深入了解,datanode之间可能也需要使用SSH协议。假若是,则需要配置使得所有节点之间可以相互SSH无密码登陆验证。

2.1配置所有节点之间SSH无密码验证

(0)原理

节点A要实现无密码公钥认证连接到节点B上时,节点A是客户端,节点B是服务端,需要在客户端A上生成一个密钥对,包括一个公钥和一个私钥,而后将公钥复制到服务端B上。当客户端A通过ssh连接服务端B时,服务端B就会生成一个随机数并用客户端A的公钥对随机数进行加密,并发送给客户端A。客户端A收到加密数之后再用私钥进行解密,并将解密数回传给B,B确认解密数无误之后就允许A进行连接了。这就是一个公钥认证过程,其间不需要用户手工输入密码。重要过程是将客户端A公钥复制到B上。

因此如果要实现所有节点之间无密码公钥认证,则需要将所有节点的公钥都复制到所有节点上。

(1)所有机器上生成密码对

(a)所有节点用hadoop用户登陆,并执行以下命令,生成rsa密钥对:

ssh-keygen -t rsa

这将在/home/hd_space/.ssh/目录下生成一个私钥id_rsa和一个公钥id_rsa.pub。

 

# su hadoop  

 

ssh-keygen -trsa

Generatingpublic/private rsa key pair.

Enter file in whichto save the key (/home/ hadoop /.ssh/id_rsa): 默认路径

Enter passphrase(empty for no passphrase):  回车,空密码

Enter samepassphrase again:

Your identificationhas been saved in /home/ hadoop /.ssh/id_rsa.

Your public key hasbeen saved in /home/ hadoop /.ssh/id_rsa.pub.

这将在/home/hd_space/.ssh/目录下生成一个私钥id_rsa和一个公钥id_rsa.pub。

 

 

(b)将所有datanode节点的公钥id_rsa.pub传送到namenode上:

DataNode1上执行命令:

scp id_rsa.pub  hadoop@NameNode:/home/hadoop/.ssh/ id_rsa.pub.datanode1

......

DataNodeN上执行命令:

scp id_rsa.pub  hadoop@NameNode:/home/hadoop/.ssh/ id_rsa.pub.datanoden

 

检查一下是否都已传输过来

各个数据节点的公钥已经传输过来。

 

 

 (c)namenode节点上综合所有公钥(包括自身)并传送到所有节点上

[[email protected]]$ cat id_rsa.pub >> authorized_keys 这是namenode自己的公钥

[[email protected]]$ cat id_rsa.pub.datanode1 >> authorized_keys

[[email protected]]$ cat id_rsa.pub.datanode2 >> authorized_keys

[[email protected]]$ cat id_rsa.pub.datanode3 >> authorized_keys

[[email protected]]$ cat id_rsa.pub.datanode4 >> authorized_keys

 

   

chmod644 ~/.ssh/authorized_keys

 

 

 

 

 

使用SSH协议将namenode的公钥信息authorized_keys复制到所有DataNode的.ssh目录下。

scpauthorized_keys data节点ip地址:/home/hd_space/.ssh 

 

scp~/.ssh/authorized_keys hadoop@DataNode1:/home/hadoop/.ssh/authorized_keys

scp~/.ssh/authorized_keys hadoop@DataNode2:/home/hadoop/.ssh/authorized_keys

scp~/.ssh/authorized_keys hadoop@DataNode3:/home/hadoop/.ssh/authorized_keys

scp~/.ssh/authorized_keys hadoop@DataNode4:/home/hadoop/.ssh/authorized_keys

 

从这里就可以看到,当配置好hosts之后,就可以直接以机器名来访问各个机器,不用再记忆各个机器的具体IP地址,当集群中机器很多且IP不连续时,就发挥出威力来了。

从上图可以看到,将authorized_keys分发给各个节点之后,可以直接ssh登录,不再需要密码。

 

    这样配置过后,namenode可以无密码登录所有datanode,可以通过命令

“ssh DataNode1(2,3,4)”来验证。

    配置完毕,在namenode上执行“ssh NameNode,所有数据节点”命令,因为ssh执行一次之后将不会再询问。在各个DataNode上也进行“ssh NameNode,所有数据节点”命令。

至此,所有的节点都能相互访问,下一步开始配置jdk

3 JDK安装和Java环境变量配置

3.1 安装 JDK1.7

1.下载JDK。

选定linux环境版本,下载到的文件是:jdk-7u21-linux-x64.tar.gz

2.解压

mv  jdk-7u21-linux-x64.tar.gz

tarxf jdk-7u21-linux-x64.tar.gz

3.2 Java环境变量配置

       root用户登陆,命令行中执行命令”vim /etc/profile”,并加入以下内容,配置环境变量(注意/etc/profile这个文件很重要,后面Hadoop的配置还会用到)。

 

 

#setjava environment

exportJAVA_HOME=/opt/jdk1.7.0_21

exportJRE_HOME=/opt/jdk1.7.0_21/jre

exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

保存并退出,执行以下命令使配置生效

chmod +x  /etc/profile

source  /etc/profile

    配置完毕,在命令行中使用命令“java - version”可以判断是否成功。在hadoop用户下测试java –version,一样成功。

4 Hadoop集群配置

   在namenode上执行:

    Hadoop用户登录。

下载hadoop-2.2.0(已编译好的64位的hadoop 2.2,可以从网盘下载

http://pan.baidu.com/s/1sjz2ORN),将其解压到/opt目录下.

 

   

   (1)配置Hadoop的配置文件

    (a)配置/etc/profile

 

#set hadoop

export HADOOP_HOME=/opt/hadoop-2.2.0

exportHADOOP_CONF_DIR=/opt/hadoop-2.2.0/etc/hadoop

export YARN_CONF_DIR=/opt/hadoop-2.2.0/etc/hadoop

exportPATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

 

(b)配置hadoop-env.sh

$ vim $HADOOP_CONF_DIR/hadoop-env.sh

 

exportJAVA_HOME=/opt/jdk1.7.0_21

 

(c)yarn-env.sh

vim $HADOOP_CONF_DIR/ yarn-env.sh

export JAVA_HOME=/opt/jdk1.7.0_21

 

(d)core-site.xml

vim $HADOOP_CONF_DIR/core-site.xml

 

<property>

        <name>fs.defaultFS</name>

        <value>hdfs://NameNode:9000</value>

 </property>

<property>

         <name>hadoop.tmp.dir</name>

          <value>/home/hadoop/hd_space/tmp</value>

</property>

 

(e)hdfs-site.xml

 

<?xmlversion="1.0" encoding="UTF-8"?>

<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version2.0 (the "License");

  you may not use this file except incompliance with the License.

  You may obtain a copy of the License at

 

    http://www.apache.org/licenses/LICENSE-2.0

 

  Unless required by applicable law or agreedto in writing, software

  distributed under the License is distributedon an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied.

  See the License for the specific languagegoverning permissions and

  limitations under the License. Seeaccompanying LICENSE file.

-->

 

<!--Put site-specific property overrides in this file. -->

 

<configuration>

        <property>

               <name>dfs.name.dir</name>

               <value>/home/hadoop/hd_space/hdfs/name</value>

        </property>

        <property>

                <name>dfs.data.dir</name>

               <value>/home/hadoop/hd_space/hdfs/data</value>

        </property>

        <property>

               <name>dfs.replication</name>

                <value>3</value>

        </property>

</configuration>

 

(f) mapred-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version2.0 (the "License");

  you may not use this file except incompliance with the License.

  You may obtain a copy of the License at

 

    http://www.apache.org/licenses/LICENSE-2.0

 

  Unless required by applicable law or agreedto in writing, software

  distributed under the License is distributedon an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,either express or implied.

  See the License for the specific languagegoverning permissions and

  limitations under the License. Seeaccompanying LICENSE file.

-->

 

<!-- Put site-specificproperty overrides in this file. -->

 

<configuration>

        <property>

               <name>mapreduce.cluster.local.dir</name>

               <value>/home/hadoop/hd_space/mapred/local</value>

        </property>

        <property>

               <name>mapreduce.cluster.system.dir</name>

                <value>/home/hadoop/hd_space/mapred/system</value>

        </property>

<property>

 <name>mapreduce.framework.name</name>

  <value>yarn</value>

 </property>

 <property>

        <name>mapreduce.jobhistory.address</name>

        <value>NameNode:10020</value>

 </property>

 <property>

      <name>mapreduce.jobhistory.webapp.address</name>

      <value>NameNode:19888</value>

 </property>

 

</configuration>

 

 (g)配置masters文件,把localhost修改为namenode的主机名

NameNode

(h)配置slaves文件, 删除localhost,加入所有datanode的主机名

DataNode1

DataNode2

DataNode3

DataNode4

(2)复制配置好的各文件到所有数据节点上。

在NameNode,执行脚本命令

    for target inDataNode1 DataNode2 DataNode3 DataNode4

do

    scp -r/opt/hadoop-2.2.0/etc/hadoop $target:/opt/hadoop-2.2.0/etc

  done

 

5 Hadoop集群启动

hadoop namenode -format  

--------------------因为配置了环境变量,此处不需要输入hadoop命令的全路径/hadoop/bin/hadoop

     执行后的结果中会提示“ dfs/namehas been successfully formatted”。否则格式化失败。

启动hadoop:

start-dfs.sh

start-yarn.sh 

启动成功后,分别在namenode和datanode所在机器上使用jps 命令查看,会在namenode所在机器上看到namenode,secondaryNamenode,ResourceManager

[hadoop@NameNode hadoop]$ jps

9097 Jps

8662 SecondaryNameNode

8836 ResourceManager

8459 NameNode

[hadoop@NameNode hadoop]$

会在datanode1所在机器上看到datanode,tasktracker.否则启动失败,检查配置是否有问题。

[root@DataNode1 .ssh]# jps

4885 Jps

4623 DataNode

4736 NodeManager

[root@DataNode1 .ssh]#

datanode1所在机器上看到datanode,NodeManager.

查看集群状态:

hdfs dfsadmin –report

 

 

停止hadoop:

./sbin/stop-dfs.sh

./sbin/stop-yarn.sh

 

查看HDFS:  http://192.168.1.150:50070/dfshealth.jsp

 

查看RM:

6 Hadoop测试

[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs-mkdir /tmp

[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -ls /

14/07/08 15:31:22 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable

Found 1 items

drwx------  - hadoop supergroup          02014-07-08 15:29 /tmp

 

[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs-copyFromLocal /opt/hadoop-2.2.0/test.txt hdfs://namenode:9000/tmp/test.txt

 

[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -ls/tmp

14/07/08 15:34:11 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable

Found 2 items

drwx------  - hadoop supergroup          02014-07-08 15:29 /tmp/hadoop-yarn

-rw-r--r--  3 hadoop supergroup       20442014-07-08 15:34 /tmp/test.txt

 

 

执行命令

[hadoop@NameNode hadoop-2.2.0]$

 

hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount/tmp/test.txt /tmp-output

 

[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -ls/tmp-output

14/07/08 16:07:21 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform... using builtin-javaclasses where applicable

Found 2 items

-rw-r--r--  3 hadoop supergroup          02014-07-08 15:35 /tmp-output/_SUCCESS

-rw-r--r--  3 hadoop supergroup       10452014-07-08 15:35 /tmp-output/part-r-00000

[hadoop@NameNode hadoop-2.2.0]$

 

查看执行结果

[hadoop@NameNode hadoop-2.2.0]$ hdfs dfs -cat /tmp-output/part-r-00000

BAD_ID=0        1

Bytes  2

CONNECTION=0    1

CPU    1

Combine 2

 

7 用YarnClient调用hadoop集群

hdfs dfs -mkdir /jar

hdfs dfs -mkdir /jar/spark

hdfs dfs -copyFromLocal/opt/spark-1.0.0-bin-2.2.0/lib/spark-assembly-1.0.0-hadoop2.2.0.jar hdfs://namenode:9000/jar/spark/spark-assembly-1.0.0-hadoop2.2.0.jar

 

只需要把解压包copy到yarn集群中的任意一台。一个节点就够了,不需要在所有节点都部署,除非你需要多个Client节点调用spark作业。

在这里我们不需要搭建独立的Spark集群,利用Yarn Client调用Hadoop集群的计算资源。

mv 解压后的目录/conf/spark-env.sh.template 解压后的目录/conf/spark-env.sh

 

编辑spark-env.sh

 

export HADOOP_HOME=/opt/hadoop-2.2.0

exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

SPARK_EXECUTOR_INSTANCES=4

SPARK_EXECUTOR_CORES=1

SPARK_EXECUTOR_MEMORY=1G

SPARK_DRIVER_MEMORY=2G

SPARK_YARN_APP_NAME="Spark 1.0.0"

这是我的配置,配置和之前的几个版本略有不同,但大差不差。

 

用Yarn Client调用一下MR中的经典例子:Spark版的word count

这里要特别注意,SparkContext有变动,之前版本wordcount例子中的的第一个参数要去掉。

为了方便,我把SPARK_HOME/lib/spark-assembly-1.0.0-hadoop2.2.0.jar 拷贝到了HDFS中进行调用。(直接调用本地磁盘也是可以的)

 

SPARK_JAR="hdfs://NameNode:9000/jar/spark/spark-assembly-1.0.0-hadoop2.2.0.jar"\

./bin/spark-class org.apache.spark.deploy.yarn.Client\

--jar./lib/spark-examples-1.0.0-hadoop2.2.0.jar \

--classorg.apache.spark.examples.JavaWordCount \

--arg hdfs://NameNode:9000/tmp/test.txt \

--num-executors 50 \

--executor-cores 1 \

--driver-memory 2048M \

--executor-memory 1000M \

--name "word count on spark"

 

运行结果在stdout中查看

 

速度还行吧,用4台节点/64个core计算5.1GB文件,用时221秒。

 

8.配置spark 1.0集群

8.1配置环境变量

添加计算节点

vi /opt/spark-1.0.0-bin-2.2.0/conf/slaves

 

DataNode1

DataNode2

DataNode3

DataNode4

 

修改spark-env.sh

cp spark-env.sh.template spark-env.sh

vi spark-env.sh

添加如下信息

export SCALA_HOME=/opt/scala-2.10.3

export JAVA_HOME=/opt/jdk1.7.0_55

export SPARK_MASTER_IP=192.168.1.150

export SPARK_WORKER_MEMORY=10G

#设置JVM的内存设置

# Set SPARK_MEM if it isn't already set sincewe also use it for this process

SPARK_MEM=${SPARK_MEM:-10g}

export SPARK_MEM

# Set JAVA_OPTS to be able to load nativelibraries and to set heap size

JAVA_OPTS="$OUR_JAVA_OPTS"

JAVA_OPTS="$JAVA_OPTS-Xms$SPARK_MEM -Xmx$SPARK_MEM"

JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"

 

SPARK_WORKER_MEMORY 是Spark在每一个节点上可用内存的最大,增加这个数值可以在内存中缓存更多的数据,但是一定要记住给Slave的操作系统和其他服务预留足够的内存。

 

 

http://stackoverflow.com/questions/21138751/spark-java-lang-outofmemoryerror-java-heap-space

下面是从stackoverflow上参考的信息

Havea look at thestart up scripts aJava heap size is set there, it looks like you're not setting this beforerunning Spark worker.

# Set SPARK_MEM if it isn't already set since we also use it for this process
SPARK_MEM=${SPARK_MEM:-512m}
export SPARK_MEM
 
# Set JAVA_OPTS to be able to load native libraries and to set heap size
JAVA_OPTS="$OUR_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"

 

8.2将程序分发给每个节点

 

for target in DataNode1 DataNode2 DataNode3DataNode4

do

    scp-r /opt/spark-1.0.0-bin-2.2.0 $target:/opt

done

8.3启动

cd /opt/spark-1.0.0-bin-2.2.0/sbin

./start-all.sh

 

[hadoop@NameNode sbin]$ ./start-all.sh

startingorg.apache.spark.deploy.master.Master, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-NameNode.out

DataNode2: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode2.out

DataNode3: starting org.apache.spark.deploy.worker.Worker,logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode3.out

DataNode1: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode1.out

DataNode4: startingorg.apache.spark.deploy.worker.Worker, logging to/opt/spark-1.0.0-bin-2.2.0/sbin/../logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-DataNode4.out

[hadoop@NameNode sbin]$

在浏览器查看

8.4执行测试程序

[[email protected]]$ bin/spark-shell--executor-memory 2g --driver-memory 1g --master spark://NameNode:7077

14/07/08 19:18:09INFO spark.SecurityManager: Changing view acls to: hadoop

14/07/08 19:18:09INFO spark.SecurityManager: SecurityManager: authentication disabled; ui aclsdisabled; users with view permissions: Set(hadoop)

14/07/08 19:18:09INFO spark.HttpServer: Starting HTTP Server

14/07/08 19:18:09INFO server.Server: jetty-8.y.z-SNAPSHOT

14/07/08 19:18:09INFO server.AbstractConnector: Started [email protected]:57198

Welcome to

      ____              __

     / __/__ ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.0.0

      /_/

 

Using Scala version2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)

Type in expressionsto have them evaluated.

Type :help for moreinformation.

14/07/08 19:18:13INFO spark.SecurityManager: Changing view acls to: hadoop

14/07/08 19:18:13INFO spark.SecurityManager: SecurityManager: authentication disabled; ui aclsdisabled; users with view permissions: Set(hadoop)

14/07/08 19:18:13INFO slf4j.Slf4jLogger: Slf4jLogger started

14/07/08 19:18:13INFO Remoting: Starting remoting

14/07/08 19:18:14INFO Remoting: Remoting started; listening on addresses:[akka.tcp://spark@NameNode:51486]

14/07/08 19:18:14INFO Remoting: Remoting now listens on addresses:[akka.tcp://spark@NameNode:51486]

14/07/08 19:18:14INFO spark.SparkEnv: Registering MapOutputTracker

14/07/08 19:18:14INFO spark.SparkEnv: Registering BlockManagerMaster

14/07/08 19:18:14INFO storage.DiskBlockManager: Created local directory at/tmp/spark-local-20140708191814-fe19

14/07/08 19:18:14INFO storage.MemoryStore: MemoryStore started with capacity 5.8 GB.

14/07/08 19:18:14INFO network.ConnectionManager: Bound socket to port 47219 with id =ConnectionManagerId(NameNode,47219)

14/07/08 19:18:14INFO storage.BlockManagerMaster: Trying to register BlockManager

14/07/08 19:18:14INFO storage.BlockManagerInfo: Registering block manager NameNode:47219 with5.8 GB RAM

14/07/08 19:18:14INFO storage.BlockManagerMaster: Registered BlockManager

14/07/08 19:18:14INFO spark.HttpServer: Starting HTTP Server

14/07/08 19:18:14INFO server.Server: jetty-8.y.z-SNAPSHOT

14/07/08 19:18:14INFO server.AbstractConnector: Started [email protected]:35560

14/07/08 19:18:14INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.1.150:35560

14/07/08 19:18:14INFO spark.HttpFileServer: HTTP File server directory is/tmp/spark-201155bc-731d-4eea-b637-88982e32ee14

14/07/08 19:18:14INFO spark.HttpServer: Starting HTTP Server

14/07/08 19:18:14INFO server.Server: jetty-8.y.z-SNAPSHOT

14/07/08 19:18:14INFO server.AbstractConnector: Started [email protected]:53311

14/07/08 19:18:14INFO server.Server: jetty-8.y.z-SNAPSHOT

14/07/08 19:18:14INFO server.AbstractConnector: Started [email protected]:4040

14/07/08 19:18:14INFO ui.SparkUI: Started SparkUI at http://NameNode:4040

14/07/08 19:18:15 WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable

14/07/08 19:18:15INFO client.AppClient$ClientActor: Connecting to masterspark://NameNode:7077...

14/07/08 19:18:15INFO repl.SparkILoop: Created spark context..

14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with appID app-20140708191815-0001

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/0 onworker-20140708190701-DataNode4-48388 (DataNode4:48388) with 16 cores

14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/0 on hostPort DataNode4:48388 with 16 cores, 2.0 GB RAM

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/1 onworker-20140708190659-DataNode3-44272 (DataNode3:44272) with 16 cores

14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/1 on hostPort DataNode3:44272 with 16 cores, 2.0 GB RAM

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/2 onworker-20140708190700-DataNode2-57378 (DataNode2:57378) with 16 cores

14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/2 on hostPort DataNode2:57378 with 16 cores, 2.0 GB RAM

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor added: app-20140708191815-0001/3 onworker-20140708190700-DataNode1-55222 (DataNode1:55222) with 16 cores

14/07/08 19:18:15INFO cluster.SparkDeploySchedulerBackend: Granted executor IDapp-20140708191815-0001/3 on hostPort DataNode1:55222 with 16 cores, 2.0 GB RAM

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/3is now RUNNING

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/2is now RUNNING

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/0is now RUNNING

14/07/08 19:18:15INFO client.AppClient$ClientActor: Executor updated: app-20140708191815-0001/1is now RUNNING

Spark contextavailable as sc.

 

scala> 14/07/0819:18:18 INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode4:40761/user/Executor#807513222] with ID0

14/07/08 19:18:18INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode1:57590/user/Executor#-2071278347] withID 3

14/07/08 19:18:18INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode2:43335/user/Executor#-723681055] withID 2

14/07/08 19:18:18INFO cluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@DataNode3:43008/user/Executor#-1215215976] withID 1

14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode4:44391 with1177.6 MB RAM

14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode1:40306 with1177.6 MB RAM

14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode2:35755 with1177.6 MB RAM

14/07/08 19:18:18INFO storage.BlockManagerInfo: Registering block manager DataNode3:42366 with1177.6 MB RAM

 

 

scala>  valrdd=sc.textFile("hdfs://NameNode:9000/tmp/test.txt")

14/07/08 19:18:39INFO storage.MemoryStore: ensureFreeSpace(141503) called with curMem=0,maxMem=6174041702

14/07/08 19:18:39INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimatedsize 138.2 KB, free 5.7 GB)

rdd:org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at<console>:12

 

scala> rdd.cache()

res0: rdd.type =MappedRDD[1] at textFile at <console>:12

 

scala> valwordcount=rdd.flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_)

14/07/08 19:19:04INFO mapred.FileInputFormat: Total input paths to process : 1

wordcount:org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[6] at reduceByKey at<console>:14

 

scala> wordcount.take(10)

14/07/08 19:19:11INFO spark.SparkContext: Starting job: take at <console>:17

14/07/08 19:19:11INFO scheduler.DAGScheduler: Registering RDD 4 (reduceByKey at<console>:14)

14/07/08 19:19:11INFO scheduler.DAGScheduler: Got job 0 (take at <console>:17) with 1output partitions (allowLocal=true)

14/07/08 19:19:11INFO scheduler.DAGScheduler: Final stage: Stage 0(take at <console>:17)

14/07/08 19:19:11INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)

14/07/08 19:19:11INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)

14/07/08 19:19:11INFO scheduler.DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[4] atreduceByKey at <console>:14), which has no missing parents

14/07/08 19:19:11INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[4]at reduceByKey at <console>:14)

14/07/08 19:19:11INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks

14/07/08 19:19:11INFO scheduler.TaskSetManager: Starting task 1.0:0 as TID 0 on executor 2:DataNode2 (NODE_LOCAL)

14/07/08 19:19:11INFO scheduler.TaskSetManager: Serialized task 1.0:0 as 2079 bytes in 6 ms

14/07/08 19:19:11INFO scheduler.TaskSetManager: Starting task 1.0:1 as TID 1 on executor 1:DataNode3 (NODE_LOCAL)

14/07/08 19:19:11INFO scheduler.TaskSetManager: Serialized task 1.0:1 as 2079 bytes in 1 ms

14/07/08 19:19:12INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on DataNode3:42366(size: 3.2 KB, free: 1177.6 MB)

14/07/08 19:19:12INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on DataNode2:35755 (size:3.1 KB, free: 1177.6 MB)

14/07/08 19:19:13INFO scheduler.TaskSetManager: Finished TID 0 in 1830 ms on DataNode2(progress: 1/2)

14/07/08 19:19:13INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 0)

14/07/08 19:19:13INFO scheduler.TaskSetManager: Finished TID 1 in 1821 ms on DataNode3(progress: 2/2)

14/07/08 19:19:13INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have allcompleted, from pool

14/07/08 19:19:13INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 1)

14/07/08 19:19:13INFO scheduler.DAGScheduler: Stage 1 (reduceByKey at <console>:14)finished in 1.853 s

14/07/08 19:19:13INFO scheduler.DAGScheduler: looking for newly runnable stages

14/07/08 19:19:13INFO scheduler.DAGScheduler: running: Set()

14/07/08 19:19:13INFO scheduler.DAGScheduler: waiting: Set(Stage 0)

14/07/08 19:19:13INFO scheduler.DAGScheduler: failed: Set()

14/07/08 19:19:13INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()

14/07/08 19:19:13INFO scheduler.DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[6] atreduceByKey at <console>:14), which is now runnable

14/07/08 19:19:13INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0(MapPartitionsRDD[6] at reduceByKey at <console>:14)

14/07/08 19:19:13INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks

14/07/08 19:19:13INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 2 on executor 2:DataNode2 (PROCESS_LOCAL)

14/07/08 19:19:13INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1972 bytes in 1 ms

14/07/08 19:19:13INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode2:36057

14/07/08 19:19:13INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 146bytes

14/07/08 19:19:13INFO scheduler.DAGScheduler: Completed ResultTask(0, 0)

14/07/08 19:19:13INFO scheduler.TaskSetManager: Finished TID 2 in 404 ms on DataNode2 (progress:1/1)

14/07/08 19:19:13INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have allcompleted, from pool

14/07/08 19:19:13INFO scheduler.DAGScheduler: Stage 0 (take at <console>:17) finished in0.407 s

14/07/08 19:19:13INFO spark.SparkContext: Job finished: take at <console>:17, took2.437269965 s

res1: Array[(String,Int)] = Array((BAD_ID=0,1), (committed,1), (Written=196192,1), (tasks=1,3),(Framework,1), (outputs=1,1), (groups=18040,1), (map,2), (Reduce,4), (ystem,1))

 

scala> valwordsort=wordcount.map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1))

14/07/08 19:19:23 INFOspark.SparkContext: Starting job: sortByKey at <console>:16

14/07/08 19:19:23INFO scheduler.DAGScheduler: Got job 1 (sortByKey at <console>:16) with 2output partitions (allowLocal=false)

14/07/08 19:19:23INFO scheduler.DAGScheduler: Final stage: Stage 2(sortByKey at<console>:16)

14/07/08 19:19:23INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 3)

14/07/08 19:19:23INFO scheduler.DAGScheduler: Missing parents: List()

14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting Stage 2 (MappedRDD[7] at map at<console>:16), which has no missing parents

14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 2(MappedRDD[7] at map at <console>:16)

14/07/08 19:19:23INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks

14/07/08 19:19:23INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 3 on executor 2:DataNode2 (PROCESS_LOCAL)

14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 1970 bytes in 0 ms

14/07/08 19:19:23INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 4 on executor 1:DataNode3 (PROCESS_LOCAL)

14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 1970 bytes in 0 ms

14/07/08 19:19:23INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode3:59586

14/07/08 19:19:23INFO scheduler.DAGScheduler: Completed ResultTask(2, 0)

14/07/08 19:19:23INFO scheduler.TaskSetManager: Finished TID 3 in 117 ms on DataNode2 (progress:1/2)

14/07/08 19:19:23INFO scheduler.DAGScheduler: Completed ResultTask(2, 1)

14/07/08 19:19:23INFO scheduler.TaskSetManager: Finished TID 4 in 168 ms on DataNode3 (progress:2/2)

14/07/08 19:19:23INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have allcompleted, from pool

14/07/08 19:19:23INFO scheduler.DAGScheduler: Stage 2 (sortByKey at <console>:16) finishedin 0.172 s

14/07/08 19:19:23INFO spark.SparkContext: Job finished: sortByKey at <console>:16, took0.19438825 s

14/07/08 19:19:23INFO spark.SparkContext: Starting job: sortByKey at <console>:16

14/07/08 19:19:23INFO scheduler.DAGScheduler: Got job 2 (sortByKey at <console>:16) with 2output partitions (allowLocal=false)

14/07/08 19:19:23INFO scheduler.DAGScheduler: Final stage: Stage 4(sortByKey at<console>:16)

14/07/08 19:19:23INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 5)

14/07/08 19:19:23INFO scheduler.DAGScheduler: Missing parents: List()

14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting Stage 4 (MappedRDD[9] at sortByKey at<console>:16), which has no missing parents

14/07/08 19:19:23INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 4(MappedRDD[9] at sortByKey at <console>:16)

14/07/08 19:19:23INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks

14/07/08 19:19:23INFO scheduler.TaskSetManager: Starting task 4.0:0 as TID 5 on executor 2:DataNode2 (PROCESS_LOCAL)

14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 2454 bytes in 0 ms

14/07/08 19:19:23 INFOscheduler.TaskSetManager: Starting task 4.0:1 as TID 6 on executor 0: DataNode4(PROCESS_LOCAL)

14/07/08 19:19:23INFO scheduler.TaskSetManager: Serialized task 4.0:1 as 2454 bytes in 0 ms

14/07/08 19:19:24INFO scheduler.DAGScheduler: Completed ResultTask(4, 0)

14/07/08 19:19:24INFO scheduler.TaskSetManager: Finished TID 5 in 104 ms on DataNode2 (progress:1/2)

14/07/08 19:19:24INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode4:45983

14/07/08 19:19:24INFO scheduler.DAGScheduler: Completed ResultTask(4, 1)

14/07/08 19:19:24INFO scheduler.TaskSetManager: Finished TID 6 in 908 ms on DataNode4 (progress:2/2)

14/07/08 19:19:24INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have allcompleted, from pool

14/07/08 19:19:24INFO scheduler.DAGScheduler: Stage 4 (sortByKey at <console>:16) finishedin 0.912 s

14/07/08 19:19:24INFO spark.SparkContext: Job finished: sortByKey at <console>:16, took0.947661867 s

wordsort:org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[12] at map at<console>:16

 

scala> wordsort.take(10)

14/07/08 19:19:31INFO spark.SparkContext: Starting job: take at <console>:19

14/07/08 19:19:31INFO scheduler.DAGScheduler: Registering RDD 7 (map at <console>:16)

14/07/08 19:19:31INFO scheduler.DAGScheduler: Got job 3 (take at <console>:19) with 1output partitions (allowLocal=true)

14/07/08 19:19:31INFO scheduler.DAGScheduler: Final stage: Stage 6(take at <console>:19)

14/07/08 19:19:31INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 7)

14/07/08 19:19:31INFO scheduler.DAGScheduler: Missing parents: List(Stage 7)

14/07/08 19:19:31INFO scheduler.DAGScheduler: Submitting Stage 7 (MappedRDD[7] at map at<console>:16), which has no missing parents

14/07/08 19:19:31INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 7(MappedRDD[7] at map at <console>:16)

14/07/08 19:19:31INFO scheduler.TaskSchedulerImpl: Adding task set 7.0 with 2 tasks

14/07/08 19:19:31INFO scheduler.TaskSetManager: Starting task 7.0:0 as TID 7 on executor 0:DataNode4 (PROCESS_LOCAL)

14/07/08 19:19:31INFO scheduler.TaskSetManager: Serialized task 7.0:0 as 2102 bytes in 1 ms

14/07/08 19:19:31INFO scheduler.TaskSetManager: Starting task 7.0:1 as TID 8 on executor 3:DataNode1 (PROCESS_LOCAL)

14/07/08 19:19:31INFO scheduler.TaskSetManager: Serialized task 7.0:1 as 2102 bytes in 0 ms

14/07/08 19:19:32INFO scheduler.TaskSetManager: Finished TID 7 in 93 ms on DataNode4 (progress:1/2)

14/07/08 19:19:32INFO scheduler.DAGScheduler: Completed ShuffleMapTask(7, 0)

14/07/08 19:19:32INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 0 to spark@DataNode1:46772

14/07/08 19:19:32INFO scheduler.TaskSetManager: Finished TID 8 in 820 ms on DataNode1 (progress:2/2)

14/07/08 19:19:32INFO scheduler.DAGScheduler: Completed ShuffleMapTask(7, 1)

14/07/08 19:19:32INFO scheduler.TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have allcompleted, from pool

14/07/08 19:19:32INFO scheduler.DAGScheduler: Stage 7 (map at <console>:16) finished in0.822 s

14/07/08 19:19:32INFO scheduler.DAGScheduler: looking for newly runnable stages

14/07/08 19:19:32INFO scheduler.DAGScheduler: running: Set()

14/07/08 19:19:32INFO scheduler.DAGScheduler: waiting: Set(Stage 6)

14/07/08 19:19:32INFO scheduler.DAGScheduler: failed: Set()

14/07/08 19:19:32INFO scheduler.DAGScheduler: Missing parents for Stage 6: List()

14/07/08 19:19:32INFO scheduler.DAGScheduler: Submitting Stage 6 (MappedRDD[12] at map at<console>:16), which is now runnable

14/07/08 19:19:32INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 6(MappedRDD[12] at map at <console>:16)

14/07/08 19:19:32INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 1 tasks

14/07/08 19:19:32INFO scheduler.TaskSetManager: Starting task 6.0:0 as TID 9 on executor 2:DataNode2 (PROCESS_LOCAL)

14/07/08 19:19:32INFO scheduler.TaskSetManager: Serialized task 6.0:0 as 2381 bytes in 0 ms

14/07/08 19:19:32INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations forshuffle 1 to spark@DataNode2:36057

14/07/08 19:19:32INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 149bytes

14/07/08 19:19:32INFO scheduler.DAGScheduler: Completed ResultTask(6, 0)

14/07/08 19:19:32INFO scheduler.TaskSetManager: Finished TID 9 in 119 ms on DataNode2 (progress:1/1)

14/07/08 19:19:32INFO scheduler.TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have allcompleted, from pool

14/07/08 19:19:32INFO scheduler.DAGScheduler: Stage 6 (take at <console>:19) finished in0.122 s

14/07/08 19:19:32INFO spark.SparkContext: Job finished: take at <console>:19, took0.978011069 s

res2:Array[(String, Int)] = Array(("",724), (Number,10), (of,10), (Map,5),(FILE:,5), (HDFS:,5), (output,5), (Reduce,4), (input,4), (time,4))

scala>

 

 

bin/spark-submit--master spark://NameNode:7077 --class org.apache.spark.examples.SparkPi--executor-memory 2g lib/spark-examples-1.0.0-hadoop2.2.0.jar 1000

部分执行结果

4/07/08 19:37:12 INFO scheduler.TaskSetManager:Finished TID 994 in 610 ms on DataNode3 (progress: 998/1000)

14/07/08 19:37:12 INFOscheduler.DAGScheduler: Completed ResultTask(0, 994)

14/07/08 19:37:12 INFOscheduler.TaskSetManager: Finished TID 997 in 620 ms on DataNode3 (progress:999/1000)

14/07/08 19:37:12 INFOscheduler.DAGScheduler: Completed ResultTask(0, 997)

14/07/08 19:37:12 INFOscheduler.TaskSetManager: Finished TID 993 in 625 ms on DataNode3 (progress:1000/1000)

14/07/08 19:37:12 INFOscheduler.DAGScheduler: Completed ResultTask(0, 993)

14/07/08 19:37:12 INFOscheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 25.020s

14/07/08 19:37:12 INFOscheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have allcompleted, from pool

14/07/08 19:37:12 INFO spark.SparkContext:Job finished: reduce at SparkPi.scala:35, took 25.502195433 s

Pi is roughly 3.14185688

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/metrics/json,null}

14/07/08 19:37:12 INFO handler.ContextHandler:stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/executors/json,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}

14/07/08 19:37:12 INFO handler.ContextHandler:stopped o.e.j.s.ServletContextHandler{/environment/json,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/environment,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/storage/rdd,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/storage/json,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/stages/pool/json,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/stages/stage/json,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stoppedo.e.j.s.ServletContextHandler{/stages/json,null}

14/07/08 19:37:12 INFOhandler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}

14/07/08 19:37:12 INFO ui.SparkUI: StoppedSpark web UI at http://NameNode:4040

14/07/08 19:37:12 INFOscheduler.DAGScheduler: Stopping DAGScheduler

14/07/08 19:37:12 INFOcluster.SparkDeploySchedulerBackend: Shutting down all executors

14/07/08 19:37:12 INFOcluster.SparkDeploySchedulerBackend: Asking each executor to shut down

14/07/08 19:37:13 INFOspark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!

14/07/08 19:37:13 INFOnetwork.ConnectionManager: Selector thread was interrupted!

14/07/08 19:37:13 INFOnetwork.ConnectionManager: ConnectionManager stopped

14/07/08 19:37:13 INFO storage.MemoryStore:MemoryStore cleared

14/07/08 19:37:13 INFO storage.BlockManager:BlockManager stopped

14/07/08 19:37:13 INFOstorage.BlockManagerMasterActor: Stopping BlockManagerMaster

14/07/08 19:37:13 INFOstorage.BlockManagerMaster: BlockManagerMaster stopped

14/07/08 19:37:13 INFO spark.SparkContext:Successfully stopped SparkContext

14/07/08 19:37:13 INFOremote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

14/07/08 19:37:13 INFOremote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down;proceeding with flushing remote transports.

 

你可能感兴趣的:(hadoop,spark)