集群节点:共4台机器,1台master,3台slave
各节点信息:
hadoop各节点管理用户均为jared,组均为hadoop,配置步骤略
|hostname| 内网ip(static) | 出口ip(dhcp) |
| master | 192.168.255.25 | 192.168.1.10 |
| node1 | 192.168.255.26 | 192.168.1.11 |
| node2 | 192.168.255.27 | 192.168.1.12 |
| node3 | 192.168.255.28 | 192.168.1.13 |
系统环境:CentOS release 6.5 (Final)
hadoop版本:apache hadoop0.20.2
java版本:jdk1.7(建议最好是1.6)
环境变量设置
export JAVA_HOME=/usr/java/jdk1.7.0_51
export HADOOP_INSTALL=/home/jared/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
各节点配置ssh互信:
配置思路:
1.各节点分别以jared用户生成一对秘钥,分别为id_rsa和id_rsa.pub
2.将各个节点的公钥(id_rsa.pub)里面的内容全部放到一个名字为authorized_keys的文件中
3.将authorized_keys文件分别copy到各个节点的/home/jared/.ssh路径下
4.各个节点互相ssh登陆测试,首次登陆需要输入"yes",以后就不需要输入了
本系统环境中authorized_keys的内容如下所示:
vim authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5u69aO7lSceNLuQFbZkRt/V4O6nxc4QNXQRNxiar+k15c3Fe+5pFMOBpQZFxgk6w4490Z/koM6HJ7Kg2s9jnSSkyhJk7YuzYvUkmQbZG0uEyxX1uor/lTlySXuwlokSzLwTaKnEk1Wkq/s7eR3zcItrX++fAnKas9IZcziZJ+fCWBH3c2BNql2/K0j3jT+oTUaNY4mPZwnYljPZr/eldQOQcM0dDtS5Q/UWHC8USXQrBtCzOTiRlIVyFC7KEMThkkfSfvPjG7bT5O2Rg9R5gzMgIsku6d0KQMQ1GKmTbV3OYStUx7ByhM8GmDN/FFZU94lW/pjcTLeqjE61FJHJ1HQ== jared@master
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqtK8BTl3wLy4Oc9mg6Xj+APrjATDWd5vFdNzP6VKi2ZWV9YuN+8Snsj6Vay6d9w7CrVzO8lShSIG2PId9YwiwBnvFzPigF2Gk9ncsSNbLzOX+9OR3jGe1NNIdfBJQfMuD/l42X4sMwJKDjK+Wpp5bQSQ63qO4vtBJ1MbM7D8FyUTIse9GgPP7otdKWEEDMQHPKXmHoKWhhg26ht3wfICqrLzLyhQhFjpYCo32d6rhLfe844ICaqEfrLnlN4wfHb19pRXhuQpMCwdsnRarGKBkQmsRW2+LtvjDvARBdefpuAEtATWfcY/48nwibOp/xPkdYKbaNSceEbDWists5tXFw== jared@node1
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA6tSOqPq76+s8EU8qj5wtcHRan9MHHWJD9HmJkhtstcDyXnoBVU0sEJdJ5sAr/2B7pq8NMAloD54KcjxhRzbj0gKO3NDBwE4Yg69hoo+uD7rNRW6yqPoONVpKEr5ngMEwjh0xh6U4whWORHfhI8sqEJX+snTNxMed3Vv7OqJVno+MplyEpTrf+vlZa9nG9Woe1QONM8s5/lJMsZHY+lgT0e1u3jR+Kedc9RMch4hfOowc1BA4IQI/bhuYAgClYkTiFZzFlX/Crio4rq22XzpFFB5+QWiUKqMCrdo9ikPhlfw3MSnnEb+/GqP8LDGuuCuzrrLj7y1184QBydFOMZPLCQ== jared@node2
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAw7Y53YhJ5L9OANGmGE6bzCT82QIVXR+AJycIGk/O5NDMiuPYKOU+HUYfWyNiY/yPKYiQbFLb4o0rTIbUOvpTLGEz8Tz7pm5Dd8OJca4DlUN/PK8Cp1osXsZa2IeGyeL/yP+RplK/zDm1xrldDjSUhFyPTOGMcAzMQkB3N+hc6s6UleV+J78YJVBeaz5foGir/gR5MBr5bpZpiYH0KVxDw65rwsBHu7KlVy5Q4lKkMUmccnKLdyVO0gnwWWenpc71UHJ0yADOzdQSpZtDjgf0dyrfiVpWDzLbj49Ie34X1kzKKXtrOeLZfSYssjf7585Qra3L+TO52Sq7yHc7oVBVLQ== jared@node3
配置hosts文件,并且分别拷贝至各个节点的相同路径下
vim /etc/hosts
添加如下内容:
192.168.255.25 master
192.168.255.26 node1
192.168.255.27 node2
192.168.255.28 node3
hadoop配置文件
vim hadoop-env.sh
添加内容:
export JAVA_HOME=/usr/java/jdk1.7.0_51
核心配置文件
vim core-site.xml
新添加如下内容:
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
<final>true</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/jared/hadoop/tmp</value>
<description>A base for other temporary directories</description>
</property>
hdfs配置文件
vim hdfs-site.xml
新添加如下内容:
<property>
<name>dfs.name.dir</name>
<value>/home/jared/hadoop/name</value>
<final>true</final>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/jared/hadoop/data</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
mapreduce配置文件
vim mapred-site.xml
新添加如下内容:
<property>
<name>mapred.job.tracker</name>
<value>192.168.255.25:9001</value>>
</property>
指定master节点
vim masters
新添加如下内容:
master
指定slave节点
vim slaves
新添加如下内容:
node1
node2
node3
向各节点复制hadoop,路径均为/home/jared/
copy方法:
[jared@master ~]$ scp -r ./hadoop/ node1:~
[jared@master ~]$ scp -r ./hadoop/ node2:~
[jared@master ~]$ scp -r ./hadoop/ node3:~
首次启动hadoop需要先格式化文件系统
[jared@master conf]$ hadoop namenode -format
14/02/20 23:36:55 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.255.25
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
14/02/20 23:36:55 INFO namenode.FSNamesystem: fsOwner=jared,hadoop,adm
14/02/20 23:36:55 INFO namenode.FSNamesystem: supergroup=supergroup
14/02/20 23:36:55 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/02/20 23:36:56 INFO common.Storage: Image file of size 95 saved in 0 seconds.
14/02/20 23:36:56 INFO common.Storage: Storage directory /home/jared/hadoop/name has been successfully formatted.
14/02/20 23:36:56 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.255.25
************************************************************/
启动hadoop
[jared@master ~]$ start-all.sh
starting namenode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-namenode-master.out
node1: starting datanode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-datanode-node1.out
node2: starting datanode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-datanode-node2.out
master: starting secondarynamenode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-secondarynamenode-master.out
starting jobtracker, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-jobtracker-master.out
node1: starting tasktracker, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-tasktracker-node1.out
node2: starting tasktracker, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-tasktracker-node2.out
用jps检验各后台进程是否启动
[jared@master ~]$ /usr/java/jdk1.7.0_51/bin/jps
22642 SecondaryNameNode
22503 NameNode
22810 Jps
22705 JobTracker
[jared@node1 conf]$ /usr/java/jdk1.7.0_51/bin/jps
22703 Jps
22610 TaskTracker
22542 DataNode
[root@node2 conf]# /usr/java/jdk1.7.0_51/bin/jps
22609 Jps
22503 TaskTracker
22445 DataNode
测试hadoop wordcount
[jared@master ~]$ /usr/java/jdk1.7.0_51/bin/jps
22642 SecondaryNameNode
22503 NameNode
23874 Jps
22705 JobTracker
[jared@master ~]$ pwd
/home/jared
[jared@master ~]$ mkdir input
[jared@master ~]$ cd input/
[jared@master input]$ ls
[jared@master input]$ echo "hello world">test1.txt
[jared@master input]$ echo "hello hadoop">test2.txt
[jared@master input]$ ls
test1.txt test2.txt
[jared@master input]$ cat test1.txt
hello world
[jared@master input]$ cat test2.txt
hello hadoop
上传文件到HDFS
[jared@master input]$ hadoop dfs -put ../input in
[jared@master input]$ hadoop dfs -ls in
Found 2 items
-rw-r--r-- 2 jared supergroup 12 2014-02-21 00:14 /user/jared/in/test1.txt
-rw-r--r-- 2 jared supergroup 13 2014-02-21 00:14 /user/jared/in/test2.txt
hadoop测试,统计文件中单词的数量, wordcount
[jared@master input]$ hadoop jar /home/jared/hadoop/hadoop-0.20.2-examples.jar wordcount in out
14/02/21 00:17:01 INFO input.FileInputFormat: Total input paths to process : 2
14/02/21 00:17:02 INFO mapred.JobClient: Running job: job_201402202338_0001
14/02/21 00:17:03 INFO mapred.JobClient: map 0% reduce 0%
14/02/21 00:17:12 INFO mapred.JobClient: map 50% reduce 0%
14/02/21 00:17:13 INFO mapred.JobClient: map 100% reduce 0%
14/02/21 00:17:24 INFO mapred.JobClient: map 100% reduce 100%
14/02/21 00:17:26 INFO mapred.JobClient: Job complete: job_201402202338_0001
14/02/21 00:17:26 INFO mapred.JobClient: Counters: 17
14/02/21 00:17:26 INFO mapred.JobClient: Job Counters
14/02/21 00:17:26 INFO mapred.JobClient: Launched reduce tasks=1
14/02/21 00:17:26 INFO mapred.JobClient: Launched map tasks=2
14/02/21 00:17:26 INFO mapred.JobClient: Data-local map tasks=2
14/02/21 00:17:26 INFO mapred.JobClient: FileSystemCounters
14/02/21 00:17:26 INFO mapred.JobClient: FILE_BYTES_READ=55
14/02/21 00:17:26 INFO mapred.JobClient: HDFS_BYTES_READ=25
14/02/21 00:17:26 INFO mapred.JobClient: FILE_BYTES_WRITTEN=180
14/02/21 00:17:26 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
14/02/21 00:17:26 INFO mapred.JobClient: Map-Reduce Framework
14/02/21 00:17:26 INFO mapred.JobClient: Reduce input groups=3
14/02/21 00:17:26 INFO mapred.JobClient: Combine output records=4
14/02/21 00:17:26 INFO mapred.JobClient: Map input records=2
14/02/21 00:17:26 INFO mapred.JobClient: Reduce shuffle bytes=61
14/02/21 00:17:26 INFO mapred.JobClient: Reduce output records=3
14/02/21 00:17:26 INFO mapred.JobClient: Spilled Records=8
14/02/21 00:17:26 INFO mapred.JobClient: Map output bytes=41
14/02/21 00:17:26 INFO mapred.JobClient: Combine input records=4
14/02/21 00:17:26 INFO mapred.JobClient: Map output records=4
14/02/21 00:17:26 INFO mapred.JobClient: Reduce input records=4
列出HDFS下的文件
[jared@master input]$ hadoop dfs -ls
Found 2 items
drwxr-xr-x - jared supergroup 0 2014-02-21 00:14 /user/jared/in
drwxr-xr-x - jared supergroup 0 2014-02-21 00:17 /user/jared/out
[jared@master input]$ hadoop dfs -ls out
Found 2 items
drwxr-xr-x - jared supergroup 0 2014-02-21 00:17 /user/jared/out/_logs
-rw-r--r-- 2 jared supergroup 25 2014-02-21 00:17 /user/jared/out/part-r-00000
查看HDFS下某个文件的内容
[jared@master input]$ hadoop dfs -cat out/part-r-00000
hadoop 1
hello 2
world 1
[jared@master input]$
[jared@node1 data]$ pwd
/home/jared/hadoop/data
[jared@node1 data]$ ls -lR
.:
total 16
drwxr-xr-x. 2 jared hadoop 4096 Feb 20 23:50 current
drwxr-xr-x. 2 jared hadoop 4096 Feb 20 22:53 detach
-rw-r--r--. 1 jared hadoop 0 Feb 20 22:53 in_use.lock
-rw-r--r--. 1 jared hadoop 157 Feb 20 22:53 storage
drwxr-xr-x. 2 jared hadoop 4096 Feb 20 23:50 tmp
./current:
total 368
-rw-r--r--. 1 jared hadoop 4 Feb 20 22:54 blk_1488417308273842703
-rw-r--r--. 1 jared hadoop 11 Feb 20 22:54 blk_1488417308273842703_1001.meta
-rw-r--r--. 1 jared hadoop 16746 Feb 20 23:49 blk_1659744422027317455
-rw-r--r--. 1 jared hadoop 139 Feb 20 23:49 blk_1659744422027317455_1033.meta
-rw-r--r--. 1 jared hadoop 8690 Feb 20 23:50 blk_-3027154220961892181
-rw-r--r--. 1 jared hadoop 75 Feb 20 23:50 blk_-3027154220961892181_1034.meta
-rw-r--r--. 1 jared hadoop 142466 Feb 20 23:38 blk_3123495904277639429
-rw-r--r--. 1 jared hadoop 1123 Feb 20 23:38 blk_3123495904277639429_1013.meta
-rw-r--r--. 1 jared hadoop 12 Feb 20 23:49 blk_5040281988852807225
-rw-r--r--. 1 jared hadoop 11 Feb 20 23:49 blk_5040281988852807225_1028.meta
-rw-r--r--. 1 jared hadoop 25 Feb 20 23:50 blk_-538339897708158192
-rw-r--r--. 1 jared hadoop 11 Feb 20 23:50 blk_-538339897708158192_1034.meta
-rw-r--r--. 1 jared hadoop 13 Feb 20 23:49 blk_6041811899305324558
-rw-r--r--. 1 jared hadoop 11 Feb 20 23:49 blk_6041811899305324558_1027.meta
-rw-r--r--. 1 jared hadoop 142466 Feb 20 23:38 blk_-7701193131489368534
-rw-r--r--. 1 jared hadoop 1123 Feb 20 23:38 blk_-7701193131489368534_1010.meta
-rw-r--r--. 1 jared hadoop 1540 Feb 20 23:50 dncp_block_verification.log.curr
-rw-r--r--. 1 jared hadoop 155 Feb 20 22:53 VERSION
./detach:
total 0
./tmp:
total 0
[jared@node1 data]$
查看HDFS基本统计信息
[jared@node1 data]$ hadoop dfsadmin -report
Configured Capacity: 103366975488 (96.27 GB)
Present Capacity: 94912688128 (88.39 GB)
DFS Remaining: 94911893504 (88.39 GB)
DFS Used: 794624 (776 KB)
DFS Used%: 0%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.255.27:50010
Decommission Status : Normal
Configured Capacity: 51683487744 (48.13 GB)
DFS Used: 397312 (388 KB)
Non DFS Used: 4226998272 (3.94 GB)
DFS Remaining: 47456092160(44.2 GB)
DFS Used%: 0%
DFS Remaining%: 91.82%
Last contact: Fri Feb 21 08:32:47 EST 2014
Name: 192.168.255.26:50010
Decommission Status : Normal
Configured Capacity: 51683487744 (48.13 GB)
DFS Used: 397312 (388 KB)
Non DFS Used: 4227289088 (3.94 GB)
DFS Remaining: 47455801344(44.2 GB)
DFS Used%: 0%
DFS Remaining%: 91.82%
Last contact: Fri Feb 21 08:32:46 EST 2014
[jared@node1 data]$
进入和退出安全模式
[jared@node1 data]$ hadoop dfsadmin -safemode enter
Safe mode is ON
[jared@node1 data]$ hadoop dfsadmin -safemode leave
Safe mode is OFF
[jared@node1 data]$
添加新节点步骤:
在新节点安装好hadoop
把namenode的有关配置文件复制到该节点
修改masters和slaves文件,增加该节点
设置ssh免密码进出该节点
单独启动该节点上的datanode和tasktracker(hadoop-daemon.sh start datanode/tasktracker)
运行start-balancer.sh进行数据负载均衡 作用:当节点出现故障,或新增加节点时,数据块分布可能不均匀,负载均衡可以重 新平衡各个datanode上数据块的分布
新加节点操作历史:
用户:root
54 groupadd hadoop
55 useradd -s /bin/bash -d /home/jared -m jared -g hadoop -G adm
56 passwd jared
57 rpm -ivh /home/jared/jdk-7u51-linux-x64.rpm
58 vim /etc/profile
59 source /etc/profile
60 exit
用户:jared
1 ssh-keygen -t rsa
2 cd .ssh/
3 ls
4 vim id_rsa.pub
5 ls
6 vim authorized_keys
8 ssh node3
9 ssh node1
10 ssh node2
11 ssh master
12 ls
13 cd
14 ls
15 cd /usr/java/
16 ls
17 cd
18 ls
19 vim /etc/profile
20 su - root
21 echo $JAVA_HOME
22 source /etc/profile
23 echo $JAVA_HOME
24 cat /etc/hosts
26 ls
27 ll
28 ls
29 cd hadoop/
30 ls
31 cd
32 vim /etc/profile
33 echo $HADOOP_INSTALL
34 hadoop-daemon.sh start datanode
35 hadoop-daemon.sh start tasktracker
36 /usr/java/jdk1.7.0_51/jps
37 start-balancer.sh
38 /usr/java/jdk1.7.0_51/jps
39 source /etc/profile
40 /usr/java/jdk1.7.0_51/bin/jps
42 history