上一节我们介绍了用于部署Hadoop的Linux环境准备,感兴趣的同学可以去看一下Hadoop伪分布式部署之linux环境准备。
这一节我们主要讲伪分布式部署hdfs,相关环境如下
操作系统:CentOS6.4
Java版本:Oracle jdk1.7
Hadoop版本:Hadoop2.5.0
主机hostname:hadoop01.datacenter.com
Hadoop的运行依赖于Java,Hadoop2.5.0我们建议使用Oracle jdk1.7,详细的java版本支持请参考Java版本依赖,具体的安装方法我们在上一节中有提到,这里就不在赘述了。
hdfs是一个分布式文件存储系统,所以我们的所有安装、配置、访问,其实都是围绕“分布式文件存储”这七个字来进行的,感兴趣的朋友可以自己查一下hdfs、namenode和datanode的相关知识。
Hadoop可以直接从官网下载,选择自己喜欢的版本,网址是:https://archive.apache.org/dist/hadoop/commo ,我们这里选择的是Hadoop2.5.0已经编译好的安装包。
下载好安装包后,我们把安装包上传到之前规划好的目录/opt/software,大概200多M。
[hadoop@hadoop01 software]$ cd /opt/software/
[hadoop@hadoop01 software]$ ll hadoop-2.5.0.tar.gz
-rw-rw-r-- 1 hadoop hadoop 311430119 Jan 7 12:55 hadoop-2.5.0.tar.gz
[hadoop@hadoop01 software]$
接下来我们把这个安装包解压到/opt/modules/目录
[hadoop@hadoop01 software]$ tar -zxf hadoop-2.5.0.tar.gz -C /opt/modules/
[hadoop@hadoop01 software]$ cd /opt/modules/
[hadoop@hadoop01 modules]$ ll
total 16
drwxr-xr-x 9 hadoop hadoop 4096 Aug 7 2014 hadoop-2.5.0
drwxr-xr-x 8 hadoop hadoop 4096 Jul 26 2014 jdk1.7.0_67
drwxrwxr-x 9 hadoop hadoop 4096 Mar 18 2014 scala-2.10.4
drwxrwxr-x 11 hadoop hadoop 4096 Aug 7 2015 spark
[hadoop@hadoop01 modules]$ cd hadoop-2.5.0/
[hadoop@hadoop01 hadoop-2.5.0]$ ll
total 28
drwxr-xr-x 2 hadoop hadoop 4096 Apr 6 15:57 bin
drwxr-xr-x 3 hadoop hadoop 4096 Apr 6 15:57 etc
drwxr-xr-x 2 hadoop hadoop 4096 Apr 6 15:57 include
drwxr-xr-x 3 hadoop hadoop 4096 Apr 6 15:57 lib
drwxr-xr-x 2 hadoop hadoop 4096 Apr 6 15:57 libexec
drwxr-xr-x 2 hadoop hadoop 4096 Apr 6 15:57 sbin
drwxr-xr-x 4 hadoop hadoop 4096 Aug 7 2014 share
[hadoop@hadoop01 hadoop-2.5.0]$
我们可以看到解压后的文件目录,统计一下存储信息
[hadoop@hadoop01 hadoop-2.5.0]$ du -sh *
424K bin
132K etc
60K include
4.5M lib
56K libexec
120K sbin
1.7G share
[hadoop@hadoop01 hadoop-2.5.0]$ du -sh ./share/*
1.6G ./share/doc
162M ./share/hadoop
[hadoop@hadoop01 hadoop-2.5.0]$
可以看出来占用空间最大的是share/doc目录,接近2G,该目录主要存储官方的英文说明文档,如果平常不需要翻阅这些文档,考虑到将来要做分布式,为了减少存储空间,可以选择删除该目录
[hadoop@hadoop01 hadoop-2.5.0]$ rm -rf share/doc
[hadoop@hadoop01 hadoop-2.5.0]$ ll share
total 4
drwxr-xr-x 8 hadoop hadoop 4096 Aug 7 2014 hadoop
[hadoop@hadoop01 hadoop-2.5.0]$
在hadoop运行环境中设置JAVA_HOME
[hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/hadoop-env.sh
...
# The java implementation to use.
export JAVA_HOME=/opt/modules/jdk1.7.0_67
...
运行bin/hadoop命令,出现如下界面代表配置成功。
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
[hadoop@hadoop01 hadoop-2.5.0]$
在etc/hadoop/core-site.xml文件中配置hdfs交互接口,添加如下配置项
[hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/core-site.xml
...
fs.defaultFS
hdfs://hadoop01.datacenter.com:8020
...
在hadoop的默认配置中,hadoop.tmp.dir的值为/tmp/hadoop-${user.name},该目录存储的是namenode的镜像文件和日志文件。大家都知道linux每次重启的时候都会清除掉/tmp目录下的所有文件,所有为了保证namenode的正常运行,我们需要添加hadoop.tmp.dir配置项,并将值改为其他目录,具体如下:
[hadoop@hadoop01 hadoop-2.5.0]$ mkdir -p data/tmp
[hadoop@hadoop01 hadoop-2.5.0]$ ll data/tmp
total 0
[hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/core-site.xml
...
hadoop.tmp.dir
/opt/modules/hadoop-2.5.0/data/tmp
...
接下来我们做datanode的配置,因为我们此次部署的是伪分布式,所以只需要配置一个节点就可以了
[hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/slaves
hadoop01.datacenter.com
[hadoop@hadoop01 hadoop-2.5.0]$
hadoop默认的副本数是3个,现在我们部署伪分布式,只有一个datanode,所以我们的副本数也只能配置为1了
[hadoop@hadoop01 hadoop-2.5.0]$ vim etc/hadoop/hdfs-site.xml
...
dfs.replication
1
...
使用bin/hdfs namenode -format命令格式化namenode
[hadoop@hadoop01 hadoop-2.5.0]$ ll data/tmp
total 0
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs namenode -format
...
查看namenode的镜像文件和日志文件,能看到这些文件则代表格式化成功
[hadoop@hadoop01 hadoop-2.5.0]$ ll data/tmp
total 4
drwxrwxr-x 3 hadoop hadoop 4096 Apr 6 17:02 dfs
[hadoop@hadoop01 hadoop-2.5.0]$ ll data/tmp/dfs/name/current/
total 16
-rw-rw-r-- 1 hadoop hadoop 353 Apr 6 17:02 fsimage_0000000000000000000
-rw-rw-r-- 1 hadoop hadoop 62 Apr 6 17:02 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 hadoop hadoop 2 Apr 6 17:02 seen_txid
-rw-rw-r-- 1 hadoop hadoop 206 Apr 6 17:02 VERSION
[hadoop@hadoop01 hadoop-2.5.0]$
启动namenode和datanode服务
[hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-hadoop-namenode-hadoop01.datacenter.com.out
[hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-hadoop-datanode-hadoop01.datacenter.com.out
[hadoop@hadoop01 hadoop-2.5.0]$
使用jps命令查看服务进程,能看到namenode和datanode的进程信息则代表服务启动成功
[hadoop@hadoop01 hadoop-2.5.0]$ jps
4629 DataNode
4704 Jps
4550 NameNode
[hadoop@hadoop01 hadoop-2.5.0]$
接下来可以通过50070端口访问hdfs的web界面,我机器的地址为:http://hadoop01.datacenter.com:50070
通过bin/hdfs dfs -mkdir新建相对路径目录
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p etc/conf
18/04/06 17:18:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop01 hadoop-2.5.0]$
通过Web服务页面的Utilities中的Browse the file system标签进入文件浏览器
发现我们新建的文件目录在如下位置
从上图可以看出,我们通过相对路径etc/conf新建的目录在hdfs上的绝对路径为/user/hadoop/etc/conf,由此可以看出,hdfs和linux一样,每个用户都有一个自己的家目录,而且默认路径是在家目录中的,比如我们现在的家目录就是/user/hadoop。
使用bin/hdfs dfs -put命令上传文件
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -put etc/hadoop/slaves /user/hadoop/etc/conf
18/04/06 17:29:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop01 hadoop-2.5.0]$
通过bin/hdfs dfs -cat查看文件内容
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -cat etc/conf/slaves
18/04/06 17:32:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop01.datacenter.com
[hadoop@hadoop01 hadoop-2.5.0]$
通过bin/hdfs dfs -get下载文件到本地
[hadoop@hadoop01 hadoop-2.5.0]$ bin/hdfs dfs -get etc/conf/slaves /home/hadoop
18/04/06 17:34:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop01 hadoop-2.5.0]$ ll /home/hadoop
total 40
drwxrwxr-x 2 hadoop hadoop 4096 Apr 6 17:33 conf
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 14:39 Desktop
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 22:29 Documents
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 22:29 Downloads
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 22:29 Music
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 22:29 Pictures
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 22:29 Public
-rw-r--r-- 1 hadoop hadoop 24 Apr 6 17:34 slaves
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 22:29 Templates
drwxr-xr-x. 2 hadoop hadoop 4096 Dec 24 22:29 Videos
[hadoop@hadoop01 hadoop-2.5.0]$
关闭namenode和datanode服务
[hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh stop namenode
stopping namenode
[hadoop@hadoop01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh stop datanode
stopping datanode
[hadoop@hadoop01 hadoop-2.5.0]$
使用jps命令查看服务关闭是否成功
[hadoop@hadoop01 hadoop-2.5.0]$ jps
5221 Jps
[hadoop@hadoop01 hadoop-2.5.0]$
1.hadoop的运行依赖与java
2.整个部署过程基本是java环境配置、namenode和datanode配置,然后启动namenode和datanode服务
3.可以通过bin/hdfs dfs -[参数]的形式操作分布式文件系统hdfs,用法与linux本地的文件操作用法类似