下载Hadoop http://apache.fayea.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
下载JDK http://download.oracle.com/otn-pub/java/jdk/8u66-b17/jdk-8u66-linux-x64.tar.gz
二、服务器准备(三台服务器,最好4台)
最好4台服务器,1台namenode,3台datanode(hadoop默认副本数为3),如果空余的话可以加一个secondnode(namenode的备用节点,据说已经通过zookper实现了热备,这个有待考证。)
准备三台服务器:
角色 IP 主机名
Namenode 192.168.63.227 NameNode
Datanode 192.168.63.202 node1
Datanode 192.168.63.203 node2
ntpdate time.windows.com
service iptables stop;chkconfig iptables off sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux; setenforce 0
增加如下内容:
192.168.63.227 NameNode 192.168.63.202 node1 192.168.63.203 node2
useradd hduser && echo "123456" | passwd --stdin hduser
tar xf jdk-8u66-linux-x64.tar.gz -C /usr/local/ vim /etc/profile.d/java.sh
输入:
JAVA_HOME=/usr/local/jdk1.8.0_66 PATH=$JAVA_HOME/bin:$PATH export JAVA_HOME PATH
执行: source /etc/profile.d/java.sh
执行:java -version 查看java版本是否已经变成最新的。
vim /etc/profile.d/hadoop.sh
输入:
HADOOP_HOME=/usr/local/hadoop PATH=$HADOOP_HOME/bin:$PATH PATH=$HADOOP_HOME/sbin:$PATH export HADOOP_HOME PATH
执行: source /etc/profile.d/hadoop.sh
如果是非root用户启动Hadoop,这里配置指定用户的ssh免密登陆(例如上面增加了用户hduser,这里要配置hduser的免密登陆,需要先su - hduser)
ssh-keygen -t rsa -P ''(一路回车,不要输入密码) ssh-copy-id -i ~/.ssh/id_rsa.pub root@namenode(这里要特别注意,也要配置到本机的ssh免密登陆) ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1 ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
tar xf hadoop-2.7.1.tar.gz -C /usr/local/ ln -sv /usr/local/hadoop-2.7.1 /usr/local/hadoop (如果以非root用户启动Hadoop,执行:chown -R hduser. /usr/local/hadoop-2.7.1/)
创建目录:
cd /usr/local/hadoop mkdir tmp && mkdir -p hdfs/data && mkdir -p hdfs/name
修改配置文件:
cd /usr/local/hadoop
vim etc/hadoop/core-site.xml
在<configuration>中间插入:
<property> <name>fs.defaultFS</name> <value>hdfs://NameNode:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:///usr/local/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property>
vim etc/hadoop/hdfs-site.xml
在<configuration>中间插入:
<property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/hadoop/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vim etc/hadoop/mapred-site.xml
在<configuration>中间插入:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>NameNode:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>NameNode:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>NameNode:19888</value> </property> <property> <name>mapred.job.tracker</name> <value>http://NameNode:9001</value> </property>
vim etc/hadoop/yarn-site.xml
在<configuration>中间插入:
<property> <name>yarn.resourcemanager.hostname</name> <value>NameNode</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>NameNode:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>NameNode:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>NameNode:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>NameNode:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>NameNode:8088</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>1</value> </property>
修改etc/hadoop/slaves
删除localhost,把所有的datanode添加到这个文件中。
增加:
node1
node2
将NameNode服务器上的hadoop整个copy到另外2个节点上(NameNode节点执行)
scp -r hadoop-2.7.1 root@node1:/usr/local/ scp -r hadoop-2.7.1 root@node2:/usr/local/
在所有节点上执行:
ln -sv /usr/local/hadoop-2.7.1 /usr/local/hadoop
cd /usr/local/hadoop
#格式化namenode
hdfs namenode -format
登陆到NameNode节点: 执行:start-all.sh(之前已经配置过hadoop的环境变量了,所以直接执行就可以。)
如果是非root用户启动,需要su - hduser,然后再执行 start-all.sh
停止:stop-all.sh
在所有节点上执行:
hadoop dfsadmin -report
出现类似下图结果:
[root@node1 ~]# hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 15/11/13 22:39:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 37139136512 (34.59 GB) Present Capacity: 27834056704 (25.92 GB) DFS Remaining: 27833999360 (25.92 GB) DFS Used: 57344 (56 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (2): Name: 192.168.63.202:50010 (node1) Hostname: node1 Decommission Status : Normal Configured Capacity: 18569568256 (17.29 GB) DFS Used: 28672 (28 KB) Non DFS Used: 4652568576 (4.33 GB) DFS Remaining: 13916971008 (12.96 GB) DFS Used%: 0.00% DFS Remaining%: 74.95% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Fri Nov 13 22:39:28 CST 2015 Name: 192.168.63.203:50010 (node2) Hostname: node2 Decommission Status : Normal Configured Capacity: 18569568256 (17.29 GB) DFS Used: 28672 (28 KB) Non DFS Used: 4652511232 (4.33 GB) DFS Remaining: 13917028352 (12.96 GB) DFS Used%: 0.00% DFS Remaining%: 74.95% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Fri Nov 13 22:39:27 CST 2015
在namenode节点上执行命令:jps
[root@NameNode ~]# jps 7187 Jps 3493 NameNode 3991 SecondaryNameNode 4136 ResourceManager
在datanode节点上执行jps
[root@node1 ~]# jps 2801 NodeManager 3970 Jps 2698 DataNode
访问hadoop的web页面:
http://namenode:8088
http://namenode:50070