系统 :Centos6.5
软件版本: hadoop2.6.0 jdk1.8
集群状态:
master: www 192.168.78.110
slave1: node1 192.168.78.111
slave2: node2 192.168.78.112
hosts 文件
192.168.78.110 www
192.168.78.111 node1
192.168.78.112 node2
确保三台机器之间互ping 主机名能ping通
[root@www ~]# wget http://download.oracle.com/otn-pub/java/jdk/8u65-b17/jdk-8u65-linux-x64.rpm?AuthParam=1446899640_8da8d9b13f8bbe63b3bc0bc80b730f55 //下载后将.rpm后面的乱码去掉
[root@www ~]# wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.2/hadoop-2.6.2.tar.gz
# rpm -ivh jdk-8u45-linux-i586.rpm
[root@www ~]# vimx /etc/profile
#set java environment
export JAVA_HOME=/usr/java/jdk1.8.0_45 //注意若下载了其他版本,注意变通
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME CLASSPATH PATH
[root@www ~]# source !$
[root@www ~]# java -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
[root@www ~]# javac -version
javac 1.8.0_65
[root@www opt]# tar -xzvf hadoop-2.6.2.tar.gz
[root@www opt]# mkdir /opt/hadoop
[root@www src]# mv hadoop-2.6.2 /opt/hadoop
[root@www src]# cd /opt/hadoop/hadoop-2.6.2
[root@www hadoop-2.6.2]# ls
bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share
4.2 添加hadoop用户
[root@www hadoop-2.6.2]# useradd hadoop
[root@www hadoop-2.6.2]# passwd hadoop
[root@www hadoop-2.6.2]# chown -R hadoop:hadoop /opt/hadoop
4.3 修改hadoop配置文件
[root@www hadoop-2.6.2]# su - hadoop //切换为hadoop用户
[hadoop@www ~]$ mkdir -p ~/hadoop/tmp ~/dfs/data ~/dfs/name //这些目录后期要用
[hadoop@www ~]$ ls
dfs hadoop
[hadoop@www ~]$ cd /opt/hadoop/hadoop-2.6.2/
4.3.1 配置 hadoop-env.sh文件–>修改JAVA_HOME
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_65
4.3.2 配置 yarn-env.sh 文件–>>修改JAVA_HOME
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/yarn-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_65
4.3.3 配置slaves文件–>>增加slave节点
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/slaves
node1
node2
4.3.4 配置 core-site.xml文件–>>增加hadoop核心配置(hdfs文件端口是9000、file:/home/hadoop/opt/hadoop-2.6.0/tmp、)
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://www:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file: /home/hadoop/hadoop/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
</configuration>
4.3.5 配置 hdfs-site.xml 文件–>>增加hdfs配置信息(namenode、datanode端口和目录位置)
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>www:9001</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///home/hadoop/hadoop/hdfs/namesecondary</value>
</property>
</configuration>
4.3.6 配置 mapred-site.xml 文件–>>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)
[hadoop@www hadoop-2.6.2]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>www:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>www:19888</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/home/hadoop/hadoop</value>
</property>
</configuration>
4.3.7 配置 yarn-site.xml 文件–>>增加yarn功能
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>www:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>www:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>www:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>www:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>www:8088</value>
</property>
</configuration>
4.3.8 将所有文件(hadoop2.6.0和hosts)复制到node1 和node2 上
4.4.1 设置ssh免密码登陆
在三台服务器上分别执行
[hadoop@www ~]$ ssh-keygen -t rsa //直接回车不用设置密码
[hadoop@node2 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@192.168.78.110
[hadoop@node2 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@192.168.78.111
[hadoop@node2 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@192.168.78.112
4.4.2 测试ssh免密码登录
在三台服务器上分别执行
[hadoop@node2 ~]$ ssh www
[hadoop@node2 ~]$ ssh node1
[hadoop@node2 ~]$ ssh node2
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop namenode -format
[hadoop@node1 hadoop-2.6.2]$ ./bin/hadoop namenode -format
[hadoop@node2 hadoop-2.6.2]$ ./bin/hadoop namenode -format
5.2 启动hadoop
启动所有
[hadoop@www hadoop-2.6.2]$ ./sbin/start-all.sh //任意一台执行即可
正确的进程情况
master:
[hadoop@www hadoop-2.6.2]$ jps
7136 ResourceManager
6993 SecondaryNameNode
6819 NameNode
7399 Jps
slave:
[hadoop@node1 hadoop-2.6.2]$ jps
3186 Jps
3064 NodeManager
2974 DataNode
6 运行wordcount程序
6.1 创建目录和文件
[hadoop@node1 hadoop-2.6.2]$ mkdir input
[hadoop@node1 hadoop-2.6.2]$ touch input/test.log
[hadoop@node1 hadoop-2.6.2]$ echo "hello world hello hadoop" > input/test.log
[hadoop@node1 hadoop-2.6.2]$ cat input/test.log
hello world hello hadoop
6.2 在hdfs创建/input目录
[hadoop@node1 hadoop-2.6.2]$ ./bin/hadoop fs -mkdir /input
6.3 将test.log文件copy到hdfs /input目录
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs -put input/ /
6.4 查看hdfs上是否有test.log文件
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs -ls /input
15/11/08 17:59:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 2 hadoop supergroup 25 2015-11-08 17:59 /input/test.log
6.5 执行wordcount程序
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /input /output
6.6 查看结果
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs -cat /output/part-r-00000
15/11/08 18:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop 1
hello 2
world 1
7. 伪分布式集群环境的搭建
只需修改namenode的两个文件
7.1etc/hadoop/hdfs-site.xml
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>www:9001</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///home/hadoop/hadoop/hdfs/namesecondary</value>
</property>
</configuration>
7.2 etc/hadoop/slaves
[hadoop@www hadoop-2.6.2]$ vimx etc/hadoop/slaves
7.3 格式化namenode
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop namenode -format
7.4 启动
[hadoop@www hadoop-2.6.2]$ ./sbin/start-all.sh
7.5 查看进程
[hadoop@www hadoop-2.6.2]$ jps
4048 NameNode
4545 NodeManager
4130 DataNode
4459 ResourceManager
5469 Jps
4286 SecondaryNameNode
7.6 上传文件
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs -put input/ /
7.7 运行wordcount
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /input /output
7.8 查看执行结果
[hadoop@www hadoop-2.6.2]$ ./bin/hadoop fs -cat /output/part-r-00000