在安装hadoop之前确保已经安装好jdk,hadoop安装包路径Hadoop家族,此文章以hadoop-2.4.1为例,完成伪分布式集群的搭建
1、上传hadoop-2.4.1到Linux上,并解压
[hadoop01@hadoop01 hadoop]$ tar -zxvf hadoop-2.4.1.tar.gz
[hadoop01@hadoop01 ~]$ cd hadoop-2.4.1/etc/hadoop/
-----> 修改hadoop-env.sh <-----
[hadoop01@hadoop01 hadoop]$ vim hadoop-env.sh
export JAVA_HOME=/home/hadoop01/jdk1.7.0_67 #将此处修改为你的jdk路径
-----> 修改core-site.xml<-----
[hadoop01@hadoop01 hadoop]$ vim core-site.xml
添加以下内容
fs.defaultFS
hdfs://192.168.110.110:9000 ## 此处修改为你的ip,端口号为9000
hadoop.tmp.dir
/home/hadoop01/hadoop-2.4.1/tmp ## 此处修改为你保存文件的路径
-----> 修改hdfs-site.xml<-----
[hadoop01@hadoop01 hadoop]$ vim hdfs-site.xml
添加以下内容
dfs.replication
1 ## 我这里是单机版 所以设为1
dfs.secondary.http.address
192.168.110.110:50090 ## 改成你的ip,端口号为50090
-----> 修改mapred-site.xml.template<-----
[hadoop01@hadoop01 hadoop]$ mv mapred-site.xml.template mapred-site.xml
[hadoop01@hadoop01 hadoop]$ vim mapred-site.xml
添加以下内容
mapreduce.framework.name
yarn
-----> 修改yarn-site.xml<-----
[hadoop01@hadoop01 hadoop]$ vim yarn-site.xml
添加以下内容
yarn.resourcemanager.hostname
localhost ## 你的主机名
yarn.nodemanager.aux-services
mapreduce_shuffle
-----> 修改slaves(这里可以不改)<-----
[hadoop01@hadoop01 hadoop]$ vim slaves
localhost ## datanode节点ip
4、将hadoop添加到环境变量
[hadoop01@hadoop01 etc]# vim /etc/profile
############HADOOP_HOME#############
export HADOOP_HOME=/home/hadoop01/hadoop-2.4.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
5、刷新配置文件、查询hadoop版本
[hadoop01@hadoop01 hadoop]$ source /etc/profile
[hadoop01@hadoop01 hadoop]$ hadoop version
Hadoop 2.4.1
Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1604318
Compiled by jenkins on 2014-06-21T05:43Z
Compiled with protoc 2.5.0
From source with checksum bb7ac0a3c73dc131f4844b873c74b630
This command was run using /home/hadoop01/hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar
6、格式化namenode
[hadoop01@hadoop01 hadoop]$ hdfs namenode -format
最后有一行输出如下,则证明格式化成功
17/01/19 14:00:53 INFO common.Storage: Storage directory /home/hadoop01/hadoop-2.4.1/tmp/dfs/name has been successfully formatted
7、进入hadoop目录下的sbin下,启动hadoop,可以使用./start-all.sh启动,但是建议先启动HDFS再启动YARN
---->启动HDFS
[hadoop01@hadoop01 sbin]$ ./start-dfs.sh
查看HDFS启动情况,出现以下服务名称,则证明启动成功
[hadoop01@hadoop01 sbin]$ jps
25550 DataNode
25695 SecondaryNameNode
25807 Jps
25435 NameNode
----->启动YARN
[hadoop01@hadoop01 sbin]$ ./start-yarn.sh
查看YARN启动,出现ResourceManager和NodeManager服务,则证明YARN启动成功
[hadoop01@hadoop01 sbin]$ jps
25550 DataNode
25695 SecondaryNameNode
26266 Jps
25861 ResourceManager
26146 NodeManager
25435 NameNode
8、配置ssh免密登录
输入以下命令,出现提示按回车即可
[hadoop01@hadoop01 sbin]$ cd ~
[hadoop01@hadoop01 ~]$ ssh-keygen
执行完这个命令后,会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)将公钥拷贝到要免密登陆的目标机器上
[hadoop01@hadoop01 ~]$ ssh-copy-id localhost
检查ssh免密登录是否成功,不需要输入密码,则表示成功
[hadoop01@hadoop01 ~]$ ssh localhost