大数据作业(一)基于docker的hadoop集群环境搭建

文章目录

    • 一、安装docker(Docker CE)
      • (一)设置软件仓库
      • (二)安装Docker CE
    • 二、在docker中安装Ubuntu系统
    • 三、Ubuntu系统初始化
      • (一)刷新源
      • (二)安装一些必要的软件
      • (三)保存镜像文件
    • 四、安装Hadoop
    • 四、配置Hadoop集群

主要是根据厦门大学数据库实验室的教程( http://dblab.xmu.edu.cn/blog/1233/)在Ubuntu16.04环境下进行搭建。

一、安装docker(Docker CE)

根据docker官网教程(https://docs.docker.com/install/linux/docker-ce/ubuntu/)教程进行安装。
官方提供三种方式进行安装,一种是从docker仓库安装,还有一种方式是从安装包安装,最后一种就是使用脚本进行安装,我选择的是从docker仓库安装,这样方便以后更新docker。

(一)设置软件仓库

1、首先升级现有软件仓库,更新包

$ sudo apt update

2、然后安装以下所需软件:

$ sudo apt install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

3、添加docker的官方GPG

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

通过以下命令确定key值为9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88

$ sudo apt-key fingerprint 0EBFCD88

pub   4096R/0EBFCD88 2017-02-22
      Key fingerprint = 9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid                  Docker Release (CE deb) <[email protected]>
sub   4096R/F273FCD8 2017-02-22

4、docker拥有stable、edge、test三个版本,其中stable为稳定版,所以选择安装稳定版

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

(二)安装Docker CE

1、刷新一下软件源

$ sudo apt update

2、安装最新版本的Docker CE

$ sudo apt-get install docker-ce

3、验证是否安装成功

$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Pull complete 
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

4、添加用户权限
由于docker默认只有root才能执行,所以还需要为当前用户添加权限

$ sudo usermod -aG docker zhangsl

然后注销系统之后再次登录,验证权限是否添加成功

$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

二、在docker中安装Ubuntu系统

首先是从docker hub上面拉取一个Ubuntu镜像

$ docker pull ubuntu

然后验证是否安装成功

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED          SIZE
hello-world         latest              4ab4c602aa5e        2 weeks ago   1.84kB
ubuntu              latest              cd6d8154f1e1        2 weeks ago   84.1MB

在启动镜像时,需要一个文件夹向镜像内部进行文件传输,所以在家目录下面新建一个文件用于文件传输

$ mkdir docker-ubuntu  

然后在docker上运行Ubuntu

$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu ubuntu
root@b59d716dbb4d:/# 

三、Ubuntu系统初始化

由于刚刚安装好之后的系统是纯净系统,很多软件都没有装,所以需要刷新一下软件源以及安装一些必要的软件。

(一)刷新源

由于在docker 上面运行的Ubuntu默认登录的为root用户,所以运行命令不需要sudo

root@b59d716dbb4d:/# apt update

(二)安装一些必要的软件

1、安装Vim
终端中用到的文本编辑器有vim、emacs、nano等,个人比较习惯vim,所以才选择vim

root@b59d716dbb4d:/# apt  install vim

2、安装sshd
由于分布式需要用ssh连接到docker内的镜像

root@b59d716dbb4d:/# apt  install ssh

然后在~/.bashrc内加入/etc/init.d/ssh start,保证每次启动镜像时都会自动启动ssh服务,也可以使用service或者systemctl设置ssh服务自动启动
然后就是配置ssh免密登录

root@b59d716dbb4d:~# ssh-keygen -t rsa #一直按回车键即可
root@b59d716dbb4d:~# cd .ssh
root@b59d716dbb4d:~/.ssh# cat id_dsa.pub >> authorized_keys

3、安装jdk
由于Hadoop需要Java,因此还需要安装jdk,由于默认的jdk为Java10,所以需要改成java8

root@b59d716dbb4d:~/# apt  install openjdk-8-jdk

接下来设置JAVA_HOMEPATH变量
只需要在~/.bashrc最后加入

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin

然后使~/.bashrc生效

root@b59d716dbb4d:~/# source ~/.bashrc

(三)保存镜像文件

由于容器内的修改不会自动保存,所以需要对容器进行一个保存,首先登录docker

zhangsl@zhangsl:~$ docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don't have a Docker ID, head over to https://hub.docker.com to create one.
Username: zhangshuoliang007
Password: 
WARNING! Your password will be stored unencrypted in /home/zhangsl/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

然后可以使用docker ps来保存镜像

zhangsl@zhangsl:~$ docker ps #查看当前运行容器信息
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
b59d716dbb4d        ubuntu              "/bin/bash"         About an hour ago   Up About an hour                        ubuntu
zhangsl@zhangsl:~$ docker commit b59d716dbb4d ubuntu/jdkinstalled #将id为b59d716dbb4d的容器保存为一个新的镜像,名为ubuntu/jdkinstalled
sha256:07a39087f9bcb985151ade3e225448556dd7df089477b69e5b71b600ad9634c6
zhangsl@zhangsl:~$ docker images #查看当前所有镜像
REPOSITORY            TAG                 IMAGE ID            CREATED        SIZE
ubuntu/jdkinstalled   latest     07a39087f9bc        3 minutes ago601MB
hello-world           latest       4ab4c602aa5e        2 weeks ago      1.84kB
ubuntu                latest        cd6d8154f1e1        2 weeks ago       84.1MB

四、安装Hadoop

安装hadoop有两种方法,一种是从源码编译安装,另外一种是从官网下载二进制文件,为了方便,选择从官网下载二进制文件。
首先开启刚才保存的镜像

zhangsl@zhangsl:~$ docker run -it -v ~/docker-ubuntu:/root/docker-ubuntu --name ubuntu-jdkinstalled ubuntu/jdkinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
root@849719de5ccf:/# 

开启之后,把下载的hadoop(http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz)文件放到第一步新建的文件夹内

root@849719de5ccf:/# cd /root/docker-ubuntu
root@849719de5ccf::~/docker-ubuntu#tar -zxvf hadoop-2.9.1.tar.gz -C /usr/local

测试hadoop是否安装成功

root@849719de5ccf:~/docker-ubuntu# cd /usr/local/hadoop-2.9.1/
root@849719de5ccf:/usr/local/hadoop-2.9.1# ls
LICENSE.txt  README.txt  etc      lib      sbin
NOTICE.txt   bin         include  libexec  share
root@849719de5ccf:/usr/local/hadoop-2.9.1# ./bin/hadoop version
Hadoop 2.9.1
Subversion https://github.com/apache/hadoop.git -r e30710aea4e6e55e69372929106cf119af06fd0e
Compiled by root on 2018-04-16T09:33Z
Compiled with protoc 2.5.0
From source with checksum 7d6d2b655115c6cc336d662cc2b919bd
This command was run using /usr/local/hadoop-2.9.1/share/hadoop/common/hadoop-common-2.9.1.jar

四、配置Hadoop集群

首先需要修改hadoop-env.sh中的JAVA_HOME

root@849719de5ccf:/usr/local/hadoop-2.9.1# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

接下来修改core-site.xml

root@849719de5ccf:/usr/local/hadoop-2.9.1# vim  etc/hadoop/core-site.xml 
<configuration>
      <property>
          <name>hadoop.tmp.dirname>
          <value>file:/usr/local/hadoop-2.9.1/tmpvalue>
          <description>Abase for other temporary directories.description>
      property>
      <property>
          <name>fs.defaultFSname>
          <value>hdfs://master:9000value>
      property>
configuration>

然后修改hdfs-site.xml

root@849719de5ccf:/usr/local/hadoop-2.9.1# vim  etc/hadoop/hdfs-site.xml 
<configuration>
    <property>
        <name>dfs.namenode.name.dirname>
        <value>file:/usr/local/hadoop-2.9.1/namenode_dirvalue>
    property>
    <property>
        <name>dfs.datanode.data.dirname>
        <value>file:/usr/local/hadoop-2.9.1/datanode_dirvalue>
    property>
    <property>
        <name>dfs.replicationname>
        <value>3value>
    property>
configuration>

然后将mapred-site.xml.template复制为mapred-site.xml,然后进行修改

root@849719de5ccf:/usr/local/hadoop-2.9.1# cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml           
<configuration>
    <property>
        <name>mapreduce.framework.namename>
        <value>yarnvalue>
property>
configuration>

最后修改yarn-site.xml

root@849719de5ccf:/usr/local/hadoop-2.9.1# vim etc/hadoop/yarn-site.xml
<configuration>
  
      <property>
          <name>yarn.nodemanager.aux-servicesname>
          <value>mapreduce_shufflevalue>
      property>
      <property>
          <name>yarn.resourcemanager.hostnamename>
          <value>mastervalue>
      property>
configuration>

现在集群配置的已经差不多的,将现有的镜像保存一下

zhangsl@zhangsl:~$ docker commit 2ecf3c0dba0e ubuntu/hadoopinstalled
sha256:957de951c1d3093fa8e731bd63a6672706de2ca86d0dafb626dcb830536e774f

然后在三个终端上面开启三个容器镜像,分别代表集群中的masterslave01slave02

# 第一个终端
zhangsl@zhangsl:~$docker run -it -h master --name master ubuntu/hadoopinstalled
# 第二个终端
zhangsl@zhangsl:~$docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
# 第三个终端
zhangsl@zhangsl:~$docker run -it -h slave02 --name slave02 ubuntu/hadoopinstallede

然后分别查看他们的/etc/hosts文件

zhangsl@zhangsl:~$ docker run -it -h master --name master ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
root@master:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.2	master

zhangsl@zhangsl:~$ docker run -it -h slave01 --name slave01 ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
root@slave01:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.3	slave01

zhangsl@zhangsl:~$ docker run -it -h slave02 --name slave02 ubuntu/hadoopinstalled
 * Starting OpenBSD Secure Shell server sshd                             [ OK ] 
root@slave02:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.4	slave02

最后把上述三个地址信息分别复制到master,slave01slave02/etc/hosts即可,可以用如下命令来检测下是否master是否可以连上slave01slave02

root@master:/# ssh slave01
The authenticity of host 'slave01 (172.17.0.3)' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave01,172.17.0.3' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                         [ OK ] 
root@slave01:~# exit
logout
Connection to slave01 closed.
root@master:/# ssh slave02
The authenticity of host 'slave02 (172.17.0.4)' can't be established.
ECDSA key fingerprint is SHA256:tftmBWuWvCdqN5wURisQCO9q25RhxS6GXkmBr++Qt48.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave02,172.17.0.4' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-34-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                         [ OK ] 
root@slave02:~# exit
logout
Connection to slave02 closed.

接下来是配置集群的最后一步,打开master上面的sslaves文件,输入slave0slave02

root@master:/usr/local/hadoop-2.9.1# vim etc/hadoop/slaves 

slave01
slave02 

这样集群就配置完成了,接下来是启动
在master上面,进入/usr/local/hadoop-2.9.1,然后运行如下命令

root@master:/usr/local/hadoop-2.9.1# bin/hdfs namenode -format
root@master:/usr/local/hadoop-2.9.1# sbin/start-all.sh

这个时候集群已经启动了,然后在master,slave01和slave02上分别运行命令jps查看运行结果

五、运行Hadoop示例程序grep
因为要用到hdfs,所以先在hdfs上面创建一个目录

root@master:/usr/local/hadoop-2.9.1# bin/hdfs dfs -mkdir -p /user/hadoop/input

然后将/usr/local/hadoop-2.9.1/etc/hadoop/目录下的所有文件拷贝到hdfs上的目录:

root@master:/usr/local/hadoop-2.9.1# bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input

然后通过ls命令查看下是否正确将文件上传到hdfs下:

root@master:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input

root@master:/usr/local/hadoop-2.9.1# bin/hdfs dfs -ls /user/hadoop/input
Found 9 items
-rw-r--r--   3 root supergroup       7861 2018-09-24 11:54 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r--   3 root supergroup       1036 2018-09-24 11:54 /user/hadoop/input/core-site.xml
-rw-r--r--   3 root supergroup      10206 2018-09-24 11:54 /user/hadoop/input/hadoop-policy.xml
-rw-r--r--   3 root supergroup       1091 2018-09-24 11:54 /user/hadoop/input/hdfs-site.xml
-rw-r--r--   3 root supergroup        620 2018-09-24 11:54 /user/hadoop/input/httpfs-site.xml
-rw-r--r--   3 root supergroup       3518 2018-09-24 11:54 /user/hadoop/input/kms-acls.xml
-rw-r--r--   3 root supergroup       5939 2018-09-24 11:54 /user/hadoop/input/kms-site.xml
-rw-r--r--   3 root supergroup        844 2018-09-24 11:54 /user/hadoop/input/mapred-site.xml
-rw-r--r--   3 root supergroup        942 2018-09-24 11:54 /user/hadoop/input/yarn-site.xml

接下来,通过运行下面命令执行实例程序:

root@master:/usr/local/hadoop-2.9.1# bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
adoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
18/09/24 11:57:19 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
18/09/24 11:57:20 INFO input.FileInputFormat: Total input files to process : 9
18/09/24 11:57:20 INFO mapreduce.JobSubmitter: number of splits:9
18/09/24 11:57:21 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/09/24 11:57:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0001
18/09/24 11:57:21 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0001
18/09/24 11:57:21 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0001/
18/09/24 11:57:21 INFO mapreduce.Job: Running job: job_1537789095052_0001
18/09/24 11:57:26 INFO mapreduce.Job: Job job_1537789095052_0001 running in uber mode : false
18/09/24 11:57:26 INFO mapreduce.Job:  map 0% reduce 0%
18/09/24 11:57:34 INFO mapreduce.Job:  map 89% reduce 0%
18/09/24 11:57:35 INFO mapreduce.Job:  map 100% reduce 0%
18/09/24 11:57:39 INFO mapreduce.Job:  map 100% reduce 100%
18/09/24 11:57:41 INFO mapreduce.Job: Job job_1537789095052_0001 completed successfully
18/09/24 11:57:41 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=115
		FILE: Number of bytes written=1979213
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=33107
		HDFS: Number of bytes written=219
		HDFS: Number of read operations=30
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=1
		Launched map tasks=9
		Launched reduce tasks=1
		Data-local map tasks=9
		Total time spent by all maps in occupied slots (ms)=51634
		Total time spent by all reduces in occupied slots (ms)=2287
		Total time spent by all map tasks (ms)=51634
		Total time spent by all reduce tasks (ms)=2287
		Total vcore-milliseconds taken by all map tasks=51634
		Total vcore-milliseconds taken by all reduce tasks=2287
		Total megabyte-milliseconds taken by all map tasks=52873216
		Total megabyte-milliseconds taken by all reduce tasks=2341888
	Map-Reduce Framework
		Map input records=891
		Map output records=4
		Map output bytes=101
		Map output materialized bytes=163
		Input split bytes=1050
		Combine input records=4
		Combine output records=4
		Reduce input groups=4
		Reduce shuffle bytes=163
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =9
		Failed Shuffles=0
		Merged Map outputs=9
		GC time elapsed (ms)=1378
		CPU time spent (ms)=2880
		Physical memory (bytes) snapshot=2824376320
		Virtual memory (bytes) snapshot=19761373184
		Total committed heap usage (bytes)=1956642816
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=32057
	File Output Format Counters 
		Bytes Written=219
18/09/24 11:57:41 INFO client.RMProxy: Connecting to ResourceManager at master/172.17.0.2:8032
18/09/24 11:57:41 INFO input.FileInputFormat: Total input files to process : 1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: number of splits:1
18/09/24 11:57:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1537789095052_0002
18/09/24 11:57:41 INFO impl.YarnClientImpl: Submitted application application_1537789095052_0002
18/09/24 11:57:41 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1537789095052_0002/
18/09/24 11:57:41 INFO mapreduce.Job: Running job: job_1537789095052_0002
18/09/24 11:57:50 INFO mapreduce.Job: Job job_1537789095052_0002 running in uber mode : false
18/09/24 11:57:50 INFO mapreduce.Job:  map 0% reduce 0%
18/09/24 11:57:54 INFO mapreduce.Job:  map 100% reduce 0%
18/09/24 11:57:58 INFO mapreduce.Job:  map 100% reduce 100%
18/09/24 11:57:59 INFO mapreduce.Job: Job job_1537789095052_0002 completed successfully
18/09/24 11:58:00 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=115
		FILE: Number of bytes written=394779
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=346
		HDFS: Number of bytes written=77
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1826
		Total time spent by all reduces in occupied slots (ms)=1917
		Total time spent by all map tasks (ms)=1826
		Total time spent by all reduce tasks (ms)=1917
		Total vcore-milliseconds taken by all map tasks=1826
		Total vcore-milliseconds taken by all reduce tasks=1917
		Total megabyte-milliseconds taken by all map tasks=1869824
		Total megabyte-milliseconds taken by all reduce tasks=1963008
	Map-Reduce Framework
		Map input records=4
		Map output records=4
		Map output bytes=101
		Map output materialized bytes=115
		Input split bytes=127
		Combine input records=0
		Combine output records=0
		Reduce input groups=1
		Reduce shuffle bytes=115
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=54
		CPU time spent (ms)=590
		Physical memory (bytes) snapshot=488009728
		Virtual memory (bytes) snapshot=3967393792
		Total committed heap usage (bytes)=344981504
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=219
	File Output Format Counters 
		Bytes Written=77

等这个程序运行结束之后,就可以在hdfs上的output目录下查看到运行结果:

root@master:/usr/local/hadoop-2.9.1#  bin/hdfs dfs -cat output/*
1	dfsadmin
1	dfs.replication
1	dfs.namenode.name.dir
1	dfs.datanode.data.dir

hdfs文件上的output目录下,输出程序正确的执行结果,hadoop分布式集群顺利执行grep程序;

你可能感兴趣的:(业余)