在VMWare中构建第二、三台虚拟机,机器命名为Slave1、Slave2,如下图
在/etc/hostname中修改主机名
vim /etc/hostname
修改为Master、Slave1、Slave2,重启后生效
step1:在各主机中修改主机名与ip对应关系
打开/etc/hosts
修改成如下所示
测试连接Master
ping Master
在Slave1、Slave2中增加所有ip与主机名
把Master下hosts也修改成上图所示
测试一下三台机器的通信
step2:SSH无密码验证配置
首先看下Master通过SSH访问Slave1的情况
这时候是需要密码的
此时Slave1的id_rsa.pub传给Master,如下所示
cd /home/dida/.ssh
scp id_rsa.pub dida@Master:/home/dida/.ssh/id_rsa.pub.Slave1
同时把Slave2的id_rsa.pub传给Master
cd /home/dida/.ssh
scp id_rsa.pub dida@Master:/home/dida/.ssh/id_rsa.pub.Slave2
在Master上综合所有公匙
cat id_rsa.pub.Slave1 >> authorized_keys
cat id_rsa.pub.Slave2 >> authorized_keys
注意,需要从哪个节点连接另外的节点,是把自己的公共密匙传给别人。
一般只需要把Master的公共密匙传给各Slave节点即可。
这时候,Slave1和Slave2节点已经可以SSH连接Master节点了,如果需要三节点互相通信,将Master的公匙信息authorized_keys复制到Slave1和Slave2的.ssh目录下:
scp authorized_keys dida@Slave1:/home/Slave1/.ssh/authorized_keys
scp authorized_keys dida@Slave2:/home/Slave2/.ssh/authorized_keys
此时三节点可以通过SSH互相无密码连接了
step3:修改Master、Slave1、Slave2配置文件
hadoop版本为2.6,配置文件全部在hadoop/etc/hadoop/文件夹下
依次修改如下:
1、配置 hadoop-env.sh
export JAVA_HOME=/home/dida/jdk1.7.0_75
2、配置 yarn-env.sh
# some Java parameters
export JAVA_HOME=/home/dida/jdk1.7.0_75
3、配置 core-site.xml,新建tmp文件夹
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://Master:9000value>
property>
<property>
<name>io.file.buffer.sizename>
<value>131072value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/home/dida/hadoop/tmpvalue>
<description>Abase for other temporary directories.description>
property>
<property>
<name>hadoop.proxyuser.dida.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.dida.groupsname>
<value>*value>
property>
configuration>
4、配置 hdfs-site.xml,新建/dfs/name与/dfs/data文件夹
<configuration>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>Master:9001value>
property>
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>/home/dida/hadoop/dfs/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/home/dida/hadoop/dfs/datavalue>
property>
<property>
<name>dfs.webhdfs.enabledname>
<value>truevalue>
property>
configuration>
5、配置 mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>Master:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>Master:19888value>
property>
configuration>
6、配置 yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname>
<value>org.apache.hadoop.mapred.ShuffleHandlervalue>
property>
<property>
<name>yarn.resourcemanager.addressname>
<value>Master:8032value>
property>
<property>
<name>yarn.resourcemanager.scheduler.addressname>
<value>Master:8030value>
property>
<property>
<name>yarn.resourcemanager.resource-tracker.addressname>
<value>Master:8035value>
property>
<property>
<name>yarn.resourcemanager.admin.addressname>
<value>Master:8033value>
property>
<property>
<name>yarn.resourcemanager.webapp.addressname>
<value>Master:8088value>
property>
configuration>
7、配置slaves文件(2.6好像是不用配置master文件的?反正我照着之前的还是新建了master并配置Master)
Slave1
Slave2
将上述所有文件复制到所有节点相应文件夹下,配置工作就完成了
step4:验证配置
进入安装目录
cd ~/hadoop/
格式化
namenode:./bin/hdfs namenode –format
启动hdfs
dida@Master:~/hadoop$ ./sbin/start-dfs.sh
15/04/04 10:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [Master]
Master: starting namenode, logging to /home/dida/hadoop/logs/hadoop-dida-namenode-Master.out
Slave2: starting datanode, logging to /home/dida/hadoop/logs/hadoop-dida-datanode-Slave2.out
Slave1: starting datanode, logging to /home/dida/hadoop/logs/hadoop-dida-datanode-Slave1.out
Starting secondary namenodes [Master]
Master: starting secondarynamenode, logging to /home/dida/hadoop/logs/hadoop-dida-secondarynamenode-Master.out
15/04/04 10:34:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
开启完成,查看进程
dida@Master:~/hadoop$ jps
3946 NameNode
4293 Jps
4158 SecondaryNameNode
Slave1、Slave2上面进程为: datenode
启动yarn
dida@Master:~/hadoop$ ./sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/dida/hadoop/logs/yarn-dida-resourcemanager-Master.out
Slave1: starting nodemanager, logging to /home/dida/hadoop/logs/yarn-dida-nodemanager-Slave1.out
Slave2: starting nodemanager, logging to /home/dida/hadoop/logs/yarn-dida-nodemanager-Slave2.out
查看进程
dida@Master:~/hadoop$ jps
3946 NameNode
4495 Jps
4158 SecondaryNameNode
4383 ResourceManager
Slave1、Slave2上面进程为: datenode nodemanager
一些相关状态查询
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成: ./bin/hdfsfsck/ -files -blocks
查看HDFS: http://Master:50070
查看RM: http://Master:8088
我就只看了下如下两个状态
1、查看集群状态
dida@Master:~/hadoop$ ./bin/hdfs dfsadmin -report
15/04/04 10:48:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 39891361792 (37.15 GB)
Present Capacity: 27972059136 (26.05 GB)
DFS Remaining: 27972009984 (26.05 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.109.130:50010 (Slave1)
Hostname: Slave1
Decommission Status : Normal
Configured Capacity: 19945680896 (18.58 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6015664128 (5.60 GB)
DFS Remaining: 13929992192 (12.97 GB)
DFS Used%: 0.00%
DFS Remaining%: 69.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Apr 04 10:48:42 CST 2015
Name: 192.168.109.131:50010 (Slave2)
Hostname: Slave2
Decommission Status : Normal
Configured Capacity: 19945680896 (18.58 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 5903638528 (5.50 GB)
DFS Remaining: 14042017792 (13.08 GB)
DFS Used%: 0.00%
DFS Remaining%: 70.40%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Apr 04 10:48:43 CST 2015
2、查看HDFS
http://Master:50070
step5:运行WordCount历程
将文件上传至hadoop的/input下
dida@Master:~/hadoop/input$ hadoop fs -put ./* /input
查看hadoop的文件系统目录
dida@Master:~/hadoop/input$ hadoop fs -ls /input
15/04/05 19:29:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 dida supergroup 26 2015-04-05 19:29 /input/f1
-rw-r--r-- 1 dida supergroup 38 2015-04-05 19:29 /input/f2
运行示例程序(WordCount):
dida@Master:~/hadoop$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
15/04/05 19:32:27 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.109.129:8032
15/04/05 19:32:29 INFO input.FileInputFormat: Total input paths to process : 2
15/04/05 19:32:30 INFO mapreduce.JobSubmitter: number of splits:2
15/04/05 19:32:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428231744993_0002
15/04/05 19:32:32 INFO impl.YarnClientImpl: Submitted application application_1428231744993_0002
15/04/05 19:32:32 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1428231744993_0002/
15/04/05 19:32:32 INFO mapreduce.Job: Running job: job_1428231744993_0002
15/04/05 19:32:45 INFO mapreduce.Job: Job job_1428231744993_0002 running in uber mode : false
15/04/05 19:32:45 INFO mapreduce.Job: map 0% reduce 0%
15/04/05 19:33:00 INFO mapreduce.Job: map 50% reduce 0%
15/04/05 19:33:04 INFO mapreduce.Job: map 100% reduce 0%
15/04/05 19:33:08 INFO mapreduce.Job: map 100% reduce 100%
15/04/05 19:33:08 INFO mapreduce.Job: Job job_1428231744993_0002 completed successfully
15/04/05 19:33:08 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=125
FILE: Number of bytes written=317729
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=248
HDFS: Number of bytes written=75
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=30554
Total time spent by all reduces in occupied slots (ms)=4923
Total time spent by all map tasks (ms)=30554
Total time spent by all reduce tasks (ms)=4923
Total vcore-seconds taken by all map tasks=30554
Total vcore-seconds taken by all reduce tasks=4923
Total megabyte-seconds taken by all map tasks=31287296
Total megabyte-seconds taken by all reduce tasks=5041152
Map-Reduce Framework
Map input records=4
Map output records=13
Map output bytes=115
Map output materialized bytes=131
Input split bytes=184
Combine input records=13
Combine output records=11
Reduce input groups=11
Reduce shuffle bytes=131
Reduce input records=11
Reduce output records=11
Spilled Records=22
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=1022
CPU time spent (ms)=17130
Physical memory (bytes) snapshot=403795968
Virtual memory (bytes) snapshot=1102848000
Total committed heap usage (bytes)=257892352
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=64
File Output Format Counters
Bytes Written=75
如上显示的过程即表示一切正常
查看程序运行结果
dida@Master:~/hadoop$ hadoop fs -cat /output/part-r-00000
15/04/05 19:34:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
are 1
bye 2
do 1
going 1
hello 1
idon 1
to 1
what 1
world 2
you 1