今天早上帮一新人远程搭建Hadoop集群(1.x或者0.22以下版本),感触颇深,在此写下最简单的Apache Hadoop搭建方法,给新人提供帮助,我尽量说得详尽点;点击查看Avatorhadoop搭建步骤。
1.环境准备:
1).机器准备:安装目标机器要能相互ping通,所以对于不同机器上的虚拟机要采取"桥连接"的方式进行网络配置(如果是宿主方式,要先关闭宿主机防火墙;上网方式的具体配置方法请google vmvair上网配置、Kvm桥连接上网、Xen在安装的时候就能够手动配置局域网IP,实在不行,请留言);关闭机器的防火墙:/etc/init.d/iptables stop;chkconfig iptables off;修改机器的主机名建议用hadoopservern,n为实际你给机器的机器编号,因为主机名如果含有'_''.'等特殊符号会导致启动问题的。修改机器的/etc/hosts,将IP和hostname的映射关系添加进去。
2).下载稳定版本Hadoop包并解压,配置Java环境(对于java环境,一般都配置~/.bash_profile,考虑到机器的安全性问题);
3).免密钥,这里有个小的技巧:在hadoopserver1上
ssh-kengen -t rsa -P '';一路回车
ssh-copy-id user@host;
然后将~/.ssh/目录下的id_rsa和id_rsa.pub,复制到其它机器;
ssh hadoopserver2;运行scp -r ~/.ssh/authorized_keys hadoopserver1:~/.ssh/;这样所有的免密钥都完成了,可以相互互相ssh;多实际,多学习,网上没有说hadoop免密钥用用ssh-copy-id来简化操作的。
2.步骤:
1).在hadoopserver1(namenode)上hadoop解压目录的conf下修改下面几个文件:
core-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://hadoopserver1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/xxx/hadoop-version/tmp</value> </property> </configuration>
hdfs-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.name.dir</name> <value>/xxx/hadoop-version/name</value> </property> <property> <name>dfs.data.dir</name> <value>/xxx/hadoop-version/data</value> </property> <property> <name>dfs.block.size</name> <value>670720</value> </property> <!-- <property> <name>dfs.secondary.http.address</name> <value>0.0.0.0:60090</value> <description> The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:60010</value> <description> The address where the datanode server will listen to. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:60075</value> <description> The datanode http server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:60020</value> <description> The datanode ipc server address and port. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:60070</value> <description> The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description> </property> --> <property> <name>dfs.support.append</name> <value>true</value> <description>Does HDFS allow appends to files? This is currently set to false because there are bugs in the "append code" and is not supported in any prodction cluster. </description> </property> </configuration>
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoopserver1:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>2</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property> <!-- <property> <name>mapred.job.tracker.http.address</name> <value>0.0.0.0:50030</value> <description> The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port. </description> </property> <property> <name>mapred.task.tracker.http.address</name> <value>0.0.0.0:60060</value> <description> The task tracker http server address and port. If the port is 0 then the server will start on a free port. </description> </property> --> </configuration>
master中填写的是secondname的hostname,用来告知hadoop在这个机器上启动secondname;
slaves则标示的是datanode节点,一行一个hostname
2).修改hadoop-env.sh:
指定JAVA_HOME到你的java安装目录
添加一个启动环境:export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"。用来保证绑定IPV4IP;
3).手动分发:scp -r hadoop目录 hadoopserver1...n:/相同前缀目录/
4).启动:
bin/hadooop namenode -format
bin/start-all.sh
5).在浏览器里输入http://hadoopserver1的iP:50070即可查看机器的状态