服务器用途 | IP地址 | 主机名 |
---|---|---|
NameNode | 192.168.3.69 | namenode.abc.local |
DataNode 1 | 192.168.3.70 | datanode1.abc.local |
DataNode 2 | 192.168.3.71 | datanode2.abc.local |
三台服务器,都最小化安装CentOS 6.6,设置主机名,静态IP地址。
CentOS 6.6 最小化安装,默认是没有Java环境的,需要安装Java环境。
下载Java运行环境的安装介质:jre-7u80-linux-x64.tar.gz
# tar xvfz jre-7u80-linux-x64.tar.gz
# mv jre1.7.0_80/ /opt
在/etc/profile中设置Java环境变量
export JAVA_HOME=/opt/jre1.7.0_80
PATH=$JAVA_HOME/bin:$PATH
export PATH
退出控制台,重新登录服务器,查看Java运行环境
[root@namenode ~]# java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
在其他两台服务器,也安装Java运行环境。
设置三台服务器之间SSH无密码登录
CentOS最小化安装没有安装scp,以及ssh客户端程序。通过rpm包安装如下:
# rpm -ivh libedit-2.11-4.20080712cvs.1.el6.x86_64.rpm
# rpm -ivh openssh-clients-5.3p1-104.el6.x86_64.rpm
libedit是openssh的依赖包
注: 通过SSH服务远程访问Linux服务器,连接非常慢,这时需要关闭SSH的DNS反解析,添加下面一行:
UseDNS no
虽然配置文件中[UseDNS yes]被注释点,但默认开关就是yes。(SSH服务默认启用了DNS反向解析的功能)
同时在SSH客户端上,设置本地的DNS解析,编辑/etc/hosts文件,增加如下配置:
192.168.3.69 namenode namenode.abc.local
192.168.3.70 datanode1 datanode1.abc.local
192.168.3.71 datanode2 datanode2.abc.local
注:在本机上安装好openssh-clients后,利用scp想把本地文件传到远程,这个时候,报错
-bash: scp: command not found。这是因为需要在远程也要安装openssh-client,要有scp程序。
在namenode上操作,生成本机的公钥,密码文件。
[root@namenode ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): #采用默认的文件存放密钥
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): #直接Enter,不输入密码
Enter same passphrase again: #直接Enter,不输入密码
Your identification has been saved in /root/.ssh/id_rsa. #生成密钥文件
Your public key has been saved in /root/.ssh/id_rsa.pub. #生成公钥文件
The key fingerprint is:
02:e0:5b:d0:53:19:25:48:e2:61:5a:a3:14:9e:d0:a6 [email protected]
The key's randomart image is:
+--[ RSA 2048]----+
|.+Xo.o++. |
|+B+*+ .. |
|o=o o. |
|E o . |
| . . S |
| . |
| |
| |
| |
+-----------------+
在本机实现ssh登录本机无需密码
# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
authorized_keys文件是新增文件,在ssh配置文件中使用
同时修改SSH服务的配置文件/etc/ssh/sshd_config。
RSAAuthentication yes # 去掉注释,开启RAS认证
PubkeyAuthentication yes # 去掉注释
AuthorizedKeysFile .ssh/authorized_keys # 去掉注释
重启SSH服务。
/etc/init.d/sshd restart
实现远程登录无需密码,需要将公钥文件上传到datanode1,将namenode上的/root/.ssh/id_rsa.pub上传到datanode1的/tmp目录。
# scp /root/.ssh/id_rsa.pub [email protected]:/tmp
此时,因为还未配置ssh无密码登录,还是需要输入密码,才能把文件上传过去。
在datanode1上,将namenode的公钥导入到SSH认证文件中。
[root@datanode1 ~]# cat /tmp/id_rsa.pub >> /root/.ssh/authorized_keys
authorized_keys文件是新增文件,在ssh配置文件中使用
修改datanode1的SSH服务的配置文件/etc/ssh/sshd_config。
RSAAuthentication yes # 去掉注释,开启RAS认证
PubkeyAuthentication yes # 去掉注释
AuthorizedKeysFile .ssh/authorized_keys # 去掉注释
重启SSH服务。
/etc/init.d/sshd restart
在namenode上验证ssh无密码登录
[root@namenode ~]# ssh [email protected]
The authenticity of host '192.168.3.70 (192.168.3.70)' can't be established.
RSA key fingerprint is c4:1f:56:68:f8:44:c7:d9:cc:97:b9:47:1c:37:bb:a7.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.3.70' (RSA) to the list of known hosts.
Last login: Mon Aug 10 18:47:02 2015 from 192.168.3.64
[root@datanode1 ~]#
把namenode上的公钥文件上传到datanode2后,进行同样的操作。
同样,datanode1的公钥也要放到namenode,datanode2上,datanode2的公钥也要放到namenode,datanode1上。三台服务器之间,都要能够ssh无密码登录。
下载Hadoop的安装介质:hadoop-2.7.1.tar.gz。
上传到namenode上,解压到/opt目录
# tar xvfz hadoop-2.7.1.tar.gz -C /opt/
在/opt/hadoop-2.7.1目录下创建数据存放的文件夹:tmp、hdfs、hdfs/data、hdfs/name。
[root@namenode hadoop-2.7.1]# mkdir tmp
[root@namenode hadoop-2.7.1]# mkdir hdfs
[root@namenode hadoop-2.7.1]# cd hdfs/
[root@namenode hdfs]# mkdir data
[root@namenode hdfs]# mkdir name
# The java implementation to use.
export JAVA_HOME=/opt/jre1.7.0_80
fs.defaultFS
hdfs://namenode.abc.local:9000
hadoop.tmp.dir
file:/opt/hadoop-2.7.1/tmp
io.file.buffer.size
131702
fs.defaultFS设置为NameNode的URI,io.file.buffer.size设置为在顺序文件中读写的缓存大小。
dfs.namenode.name.dir
file:/opt/hadoop-2.7.1/hdfs/name
dfs.datanode.data.dir
file:/opt/hadoop-2.7.1/hdfs/data
dfs.replication
2
dfs.webhdfs.enabled
true
dfs.namenode.secondary.http-address
datanode1.abc.local:9000
namenode的hdfs-site.xml是必须将dfs.webhdfs.enabled属性设置为true,否则就不能使用webhdfs的LISTSTATUS、LISTFILESTATUS等需要列出文件、文件夹状态的命令,因为这些信息都是由namenode来保存的。
hadoop 2.7.1 解决了namenode单点故障的问题,必须设置第二个namenode,通过dfs.namenode.secondary.http-address进行设置。
mapreduce.framework.name
yarn
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
namenode.abc.local
通过scp工具,将namenode上的hadoop上传到datanode上。
# cd /opt/
# scp -r hadoop-2.7.1 [email protected]:/opt/
# scp -r hadoop-2.7.1 [email protected]:/opt/
在namenode上,执行hadoop的命令
# cd /opt/hadoop-2.7.1/sbin
# ./start-dfs.sh
15/08/12 03:15:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [namenode.abc.local]
namenode.abc.local: starting namenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-namenode-namenode.abc.local.out
datanode2.abc.local: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-datanode-datanode2.abc.local.out
datanode1.abc.local: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-datanode-datanode1.abc.local.out
Starting secondary namenodes [datanode1.abc.local]
datanode1.abc.local: starting secondarynamenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-secondarynamenode-datanode1.abc.local.out
15/08/12 03:15:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
执行命令
# cd /opt/hadoop-2.7.1/bin
# ./hdfs namenode -format
..........................
15/08/12 03:48:58 INFO namenode.FSImage: Allocated new BlockPoolId: BP-486254444-192.168.3.69-1439322538827
15/08/12 03:48:59 INFO common.Storage: Storage directory **/opt/hadoop-2.7.1/hdfs/name** has been successfully formatted.
15/08/12 03:48:59 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/08/12 03:48:59 INFO util.ExitUtil: Exiting with status 0
15/08/12 03:48:59 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode.abc.local/192.168.3.69
************************************************************/
该命令在namenode上关闭了hadoop的进程,但是在datanode1,datanode2上并没有关闭hadoop的进程。可以通过stop-dfs.sh来关闭datanode上的进程。
执行完命令后,在namenode的/opt/hadoop-2.7.1/hdfs/name目录下,生成文件。
[root@namenode name]# tree
.
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION
重启hdfs系统。
在namenode的/opt/hadoop-2.7.1/hdfs/name目录下,查看文件
[root@namenode name]# tree
.
├── current
│ ├── edits_0000000000000000001-0000000000000000002
│ ├── edits_0000000000000000003-0000000000000000004
│ ├── edits_0000000000000000005-0000000000000000006
│ ├── edits_0000000000000000007-0000000000000000008
│ ├── edits_0000000000000000009-0000000000000000010
│ ├── edits_0000000000000000011-0000000000000000012
│ ├── edits_0000000000000000013-0000000000000000014
│ ├── edits_0000000000000000015-0000000000000000016
│ ├── edits_0000000000000000017-0000000000000000018
│ ├── edits_0000000000000000019-0000000000000000020
│ ├── edits_0000000000000000021-0000000000000000022
│ ├── edits_0000000000000000023-0000000000000000024
│ ├── edits_0000000000000000025-0000000000000000026
│ ├── edits_0000000000000000027-0000000000000000028
│ ├── edits_0000000000000000029-0000000000000000030
│ ├── edits_0000000000000000031-0000000000000000032
│ ├── edits_0000000000000000033-0000000000000000034
│ ├── edits_0000000000000000035-0000000000000000036
│ ├── edits_0000000000000000037-0000000000000000038
│ ├── edits_0000000000000000039-0000000000000000040
│ ├── edits_0000000000000000041-0000000000000000042
│ ├── edits_0000000000000000043-0000000000000000044
│ ├── edits_0000000000000000045-0000000000000000046
│ ├── edits_0000000000000000047-0000000000000000047
│ ├── edits_inprogress_0000000000000000048
│ ├── fsimage_0000000000000000046
│ ├── fsimage_0000000000000000046.md5
│ ├── fsimage_0000000000000000047
│ ├── fsimage_0000000000000000047.md5
│ ├── seen_txid
│ └── VERSION
└── in_use.lock #该文件,说明NameNode已经启动
在datanode1,datanode2上的/opt/hadoop-2.7.1/hdfs/data目录下,查看文件。
[root@datanode1 data]# tree
.
├── current
│ ├── BP-486254444-192.168.3.69-1439322538827
│ │ ├── current
│ │ │ ├── dfsUsed
│ │ │ ├── finalized
│ │ │ ├── rbw
│ │ │ └── VERSION
│ │ ├── scanner.cursor
│ │ └── tmp
│ └── VERSION
└── in_use.lock #该文件,说明DataNode已经启动
由于datanode1设置为第二个namenode,所以在/opt/hadoop-2.7.1/tmp目录下,生成了文件。
[root@datanode1 tmp]# tree
.
└── dfs
└── namesecondary
├── current
│ ├── edits_0000000000000000001-0000000000000000002
│ ├── edits_0000000000000000003-0000000000000000004
│ ├── edits_0000000000000000005-0000000000000000006
│ ├── edits_0000000000000000007-0000000000000000008
│ ├── edits_0000000000000000009-0000000000000000010
│ ├── edits_0000000000000000011-0000000000000000012
│ ├── edits_0000000000000000013-0000000000000000014
│ ├── edits_0000000000000000015-0000000000000000016
│ ├── edits_0000000000000000017-0000000000000000018
│ ├── edits_0000000000000000019-0000000000000000020
│ ├── edits_0000000000000000021-0000000000000000022
│ ├── edits_0000000000000000023-0000000000000000024
│ ├── edits_0000000000000000025-0000000000000000026
│ ├── edits_0000000000000000027-0000000000000000028
│ ├── edits_0000000000000000029-0000000000000000030
│ ├── edits_0000000000000000031-0000000000000000032
│ ├── edits_0000000000000000033-0000000000000000034
│ ├── edits_0000000000000000035-0000000000000000036
│ ├── edits_0000000000000000037-0000000000000000038
│ ├── edits_0000000000000000039-0000000000000000040
│ ├── edits_0000000000000000041-0000000000000000042
│ ├── edits_0000000000000000043-0000000000000000044
│ ├── edits_0000000000000000045-0000000000000000046
│ ├── edits_0000000000000000048-0000000000000000049
│ ├── fsimage_0000000000000000047
│ ├── fsimage_0000000000000000047.md5
│ ├── fsimage_0000000000000000049
│ ├── fsimage_0000000000000000049.md5
│ └── VERSION
└── in_use.lock
新建测试文件
# mkdir -p /root/input_data
# cd /root/input_data/
# echo "This is a test." >> test_data.txt
执行hadoop命令,放入文件
# cd /opt/hadoop-2.7.1/bin/
# ./hadoop fs -put /root/input_data/ /input_data
15/08/13 03:16:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
把/root/input_data目录下的文件,拷贝进HDFS的/input_data目录下
执行hadoop命令,查看文件
# ./hadoop fs -ls /input_data
15/08/13 03:20:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 2 root supergroup 16 2015-08-13 03:20 /input_data/test_data.txt
针对HDFS的操作命令
# ./hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile ... ]
[-cat [-ignoreCrc] ...]
[-checksum ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] ... ]
[-copyToLocal [-p] [-ignoreCrc] [-crc] ... ]
[-count [-q] [-h] ...]
[-cp [-f] [-p | -p[topax]] ... ]
[-createSnapshot []]
[-deleteSnapshot ]
[-df [-h] [ ...]]
[-du [-s] [-h] ...]
[-expunge]
[-find ... ...]
[-get [-p] [-ignoreCrc] [-crc] ... ]
[-getfacl [-R] ]
[-getfattr [-R] {-n name | -d} [-e en] ]
[-getmerge [-nl] ]
[-help [cmd ...]]
**[-ls [-d] [-h] [-R] [ ...]]**
[-mkdir [-p] ...]
[-moveFromLocal ... ]
[-moveToLocal ]
[-mv ... ]
**[-put [-f] [-p] [-l] ... ]**
[-renameSnapshot ]
**[-rm [-f] [-r|-R] [-skipTrash] ...]**
[-rmdir [--ignore-fail-on-non-empty] ...]
[-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]]
[-setfattr {-n name [-v value] | -x name} ]
[-setrep [-R] [-w] ...]
[-stat [format] ...]
[-tail [-f] ]
[-test -[defsz] ]
[-text [-ignoreCrc] ...]
[-touchz ...]
[-truncate [-w] ...]
[-usage [cmd ...]]
Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]