组网:
3台虚拟机 centos 6.5
vm1(master)、vm2(slave1)、vm3(slave2)
一:准备工作
1、配置java
2、配置ssh master无密码访问vm2、vm3
3、下载hadoop-2.6.0.tar.gz
我把这个压缩包放在/usr/local/下,tar -xvf hadoop-2.6.0.tar.gz
4、先创建3个文件夹
mkdir hadoop-2.6.0/tmp
mkdir hadoop-2.6.0/dfs/name
mkdir hadoop-2.6.0/dfs/data
二:配置修改(此处配置都在vm1上,也就是master)
1、配置 hadoop-env.sh文件-->修改JAVA_HOME
# The java implementation to use.
export JAVA_HOME=/home/spark/opt/java/jdk1.6.0_37
2、配置 yarn-env.sh 文件-->>修改JAVA_HOME
# some Java parameters
export JAVA_HOME=/home/spark/opt/java/jdk1.6.0_37
3、vim slaves-->>增加slave节点
vm2
vm3
4、配置 core-site.xml文件-->>增加hadoop核心配置(hdfs文件端口是9000、file:/home/spark/opt/hadoop-2.6.0/tmp、)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://vm1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop-2.6.0/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
</configuration>
5、配置 hdfs-site.xml 文件-->>增加hdfs配置信息(namenode、datanode端口和目录位置)
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>vm1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop-2.6.0/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
4.6、配置 mapred-site.xml 文件-->>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>vm1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>S1PA11:19888</value>
</property>
</configuration>
4.7、配置 yarn-site.xml 文件-->>增加yarn功能
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>vm1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>vm1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>vm1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>vm1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>vm1:8088</value>
</property>
</configuration>
三:vm2、vm3的配置
scp -r hadoop-2.6.0/ root@vm2:/usr/local/
scp -r hadoop-2.6.0/ root@vm3:/usr/local/
四:启动
1、分别去vm1、vm2、vm3下/usr/local/hadoop-2.6.0/
./bin/hdfs namenode -format
2、启动dfs、yarn
./sbin/start-dfs.sh
./sbin/start-yarn.sh
3、备注,关机时别忘了关闭hadoop
./sbin/stop-dfs.sh
./sbin/stop-yarn.sh
4、可以用jps查看hadoop进程
5、查看集群状态 ./bin/hdfs dfsadmin -report
五:web查看
1、查看hdfs:http://10.58.44.47:50070/
2、查看RM:http://10.58.44.47:8088/
六:运行样例程序wordcount
1、mkdir hadoop-2.6.0/input
2、创建两个文件
touch input/f1:hello world bye jj
touch input/f2:hello hadoop bye hadoop
3、使用hadoop命令创建hadoop目录
./bin/hadoop fs -mkdir /tmp
./bin/hadoop fs -mkdir /tmp/input
4、拷贝f1、f2到hadoop文件系统
./bin/hadoop fs -put input/ /tmp
5、查看hadoop文件系统上有没有这两个文件
./bin/hadoop fs -ls /tmp/input/
6、执行程序
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /tmp/input /output
7、查看执行结果
./bin/hadoop fs -cat /output/part-r-00000