VMWare虚拟机4个,计划部署分配如下:
192.168.181.121 nameNode
192.168.181.122 dataNode
192.168.181.123 dataNode
192.168.181.124 dataNode
各dataNode机器,都需要安装jdk和ssh,同时要将nameNode上安装的hadoop分发到各节点机器,位置、环境变量等尽量保持与nameNode一致。
yum -y install wget yum -y install make yum -y install openssh*
2、安装JDK
将下载源码包jdk-7u4-linux-x64.tar.gz
放到目录:/usr/lib/jvm下(这里位置目录可以自己定义,JDK环境变量需要对应)
直接解压:
tar zxvf jdk-7u4-linux-x64.tar.gz
vi conf/hadoop-env.sh
# The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_04
export HADOOP_HOME_WARN_SUPPRESS=1
vi /etc/profile在文件中末尾加入以下内容:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_04 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export HADOOP_HOME=/hadoop-1.0.3 export PATH=$PATH:$HADOOP_HOME/bin让环境变量生效
source /etc/profile
指定masterNode,可以用IP,也可以用机器名,masters文件配置将决定哪台机器是secondaryNameNode,准确的说这个文件名起的不对,它应该叫secondaries。
vi conf/masters 192.168.181.221 # secondaryNameNode
其中192.168.181.221这个节点既是dataNode,同时也是secondaryNameNode
之前我这用的是默认的localhost,结果验证访问:http://192.168.181.221:50070/时报http-404错误,估计是域名解析的问题,网上推荐,masters和slaves文件里,使用机器名,这里我用IP。vi conf/slaves 192.168.181.222 # dataNode 192.168.181.223 # dataNode 192.168.181.224 # dataNode指定哪些机器为从节点,用于存储数据块。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/HadoopRun/name1,/HadoopRun/name2</value> </property> <property> <name>dfs.data.dir</name> <value>/HadoopRun/data1,/HadoopRun/data2</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>192.168.181.221:9001</value> </property> <property> <name>mapred.local.dir</name> <value>/HadoopRun/var</value> </property> </configuration>配置文件参数说明:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://192.168.181.221:9000</value> <description> NameNode的URI路径,格式:hdfs://主机名:端口/ </description> </property> <property> <name>fs.checkpoint.period</name> <value>3600</value> <description> 进行checkpoint的周期时间间隔,单位:秒 </description> </property> <property> <name>fs.checkpoint.size</name> <value>67108864</value> <description> 日志文件达到这个上限值时,将进行一次强制checkpoint操作,单位:byte </description> </property> <property> <name>hadoop.tmp.dir</name> <value>/HadoopRun/tmp</value> <description> Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。/hadoopRun/tmp这里给的路径不需要创建会自动生成。 </description> </property> </configuration>
[root@gifer .ssh]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: open /root/.ssh/id_rsa failed: Permission denied. Saving the key failed: /root/.ssh/id_rsa.
vi /etc/selinux/config
SELINUX=enforcing改成
SELINUX=disabled保存重启机器后,再生成。
[root@gifer /]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 98:3c:31:5c:23:21:73:a0:a0:1f:c6:d3:c3:dc:58:32 root@gifer The key's randomart image is: +--[ RSA 2048]----+ |. E.=.o | |.o = @ o . | |. * * = | | o o o = | | . = S | | . | | | | | | | +-----------------+成功后,目录下会多出两个文件:
[root@gifer .ssh]# cat id_rsa.pub >> authorized_keys将文件内容追加到authorized_keys文件中,如果文件authorized_keys不存在,会自动创建。
[root@gifer .ssh]# ssh-copy-id -i id_rsa.pub [email protected]
也可以使用scp命令进行复制
[root@gifer .ssh]# scp authorized_keys [email protected]:/root/.ssh/
.ssh 文件夹权限:700
authorized_keys文件权限:600
[root@gifer .ssh]# ssh [email protected] Last login: Mon May 21 18:24:21 2012 from 192.168.181.1
表示成功
[root@gifer /]# scp -r hadoop-1.0.3 192.168.181.222:/这里我当前192.168.181.221机器文件夹hadoop-1.0.3在根目录下,所以192.168.181.222上也是放根目录。scp 参数 -r 表示,如果目标目录已经存在文件,就覆盖它。否则会报错:not a regular file
[root@masterNode hadoop-1.0.3]# scp -r /hadoop-1.0.3/conf 192.168.181.222:/hadoop-1.0.3/
/sbin/iptables -I INPUT -p tcp --dport 9000 -j ACCEPT /sbin/iptables -I INPUT -p tcp --dport 9001 -j ACCEPT /sbin/iptables -I INPUT -p tcp --dport 37974 -j ACCEPT /sbin/iptables -I INPUT -p tcp --dport 38840 -j ACCEPT /sbin/iptables -I INPUT -p tcp --dport 49785 -j ACCEPT /sbin/iptables -I INPUT -p tcp --dport 50030 -j ACCEPT /sbin/iptables -I INPUT -p tcp --dport 50070 -j ACCEPT /sbin/iptables -I INPUT -p tcp --dport 50090 -j ACCEPT service iptables save
查看防火墙信息
service iptables status
service iptables stop
[root@masterNode /]# cd hadoop-1.0.3 [root@masterNode hadoop-1.0.3]# bin/hadoop namenode -format 12/05/23 13:36:17 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.3 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012 ************************************************************/ Re-format filesystem in /hadoop_home/name1 ? (Y or N) y Format aborted in /hadoop_home/name1 12/05/23 13:36:29 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1 ************************************************************/
[root@masterNode hadoop-1.0.3]# bin/hadoop namenode -format 12/05/24 03:21:29 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.3 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012 ************************************************************/ 12/05/24 03:21:29 INFO util.GSet: VM type = 64-bit 12/05/24 03:21:29 INFO util.GSet: 2% max memory = 19.33375 MB 12/05/24 03:21:29 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/05/24 03:21:29 INFO util.GSet: recommended=2097152, actual=2097152 12/05/24 03:21:29 INFO namenode.FSNamesystem: fsOwner=root 12/05/24 03:21:29 INFO namenode.FSNamesystem: supergroup=supergroup 12/05/24 03:21:29 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/05/24 03:21:29 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 12/05/24 03:21:29 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 12/05/24 03:21:29 INFO namenode.NameNode: Caching file names occuring more than 10 times 12/05/24 03:21:30 INFO common.Storage: Image file of size 110 saved in 0 seconds. 12/05/24 03:21:30 INFO common.Storage: Storage directory /hadoop_home/name1 has been successfully formatted. 12/05/24 03:21:30 INFO common.Storage: Image file of size 110 saved in 0 seconds. 12/05/24 03:21:30 INFO common.Storage: Storage directory /hadoop_home/name2 has been successfully formatted. 12/05/24 03:21:30 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1 ************************************************************/
格式化分布式文件系统成功。
[root@masterNode hadoop-1.0.3]# bin/start-dfs.sh starting namenode, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-namenode-masterNode.out 192.168.181.224: starting datanode, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-datanode-node3.out 192.168.181.222: starting datanode, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-datanode-node1.out 192.168.181.223: starting datanode, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-datanode-node2.out 192.168.181.221: starting secondarynamenode, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-secondarynamenode-masterNode.out
[root@masterNode hadoop-1.0.3]# bin/start-mapred.sh starting jobtracker, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-jobtracker-masterNode.out 192.168.181.223: starting tasktracker, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-tasktracker-node2.out 192.168.181.222: starting tasktracker, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-tasktracker-node1.out 192.168.181.224: starting tasktracker, logging to /hadoop-1.0.3/libexec/../logs/hadoop-root-tasktracker-node3.out注:对于命令bin/start-all.sh(启动所有守护进程)在hadoop-1.0.3版本中已不推荐使用了。
使用jps命令查看启动的守护进程有哪些:
[root@masterNode hadoop-1.0.3]# jps 12275 NameNode 12445 SecondaryNameNode 12626 Jps 12529 JobTracker [root@node3 ~]# jps 6621 DataNode 6723 TaskTracker 6819 Jps
NameNode http://192.168.181.221:50070/
JobTracker http://192.168.181.221:50030/[root@masterNode hadoop-1.0.3]# bin/stop-dfs.sh stopping namenode 192.168.181.222: stopping datanode 192.168.181.224: stopping datanode 192.168.181.223: stopping datanode 192.168.181.221: stopping secondarynamenode
[root@masterNode hadoop-1.0.3]# bin/stop-mapred.sh stopping jobtracker 192.168.181.222: stopping tasktracker 192.168.181.224: stopping tasktracker 192.168.181.223: stopping tasktracker注:对于命令bin/stop-all.sh(关闭所有守护进程)在hadoop-1.0.3版本中已不推荐使用了。
[root@masterNode hadoop-1.0.3]# bin/hadoop fs -mkdir input mkdir: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/root/input. Name node is in safe mode.解决办法:关闭安全模式
[root@masterNode hadoop-1.0.3]# bin/hadoop dfsadmin -safemode leave Safe mode is OFF