搭建环境:
1)hadoop版本:0.23.1
2)Linux版本:Linux version 2.6.18-164.el5
3)操作系统:Red Hat Enterprise Linux Server release 5.4
拓扑结构:
总共四台机器(A、B、C、D)
namenode:A、B
datanode:A、B、C、D
ResourceManager:B
NodeManager:A、B、C、D
步骤:
1、下载hadoop0.23.1源代码和JAR包
wget http://labs.renren.com/apache-mirror//hadoop/core/hadoop-0.23.1/hadoop-0.23.1-src.tar.gz
wget http://labs.renren.com/apache-mirror//hadoop/core/hadoop-0.23.1/hadoop-0.23.1.tar.gz
2、安装
tar -xvzf hadoop-0.23.0.tar.gz
3、安装相关工具
1)java
略
2)protobuf
wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz tar -zxvf protobuf-2.4.1.tar.gz cd protobuf-2.4.1 ./configure make sudo make install
3)ssh
略
4、配置运行环境
vim ~/.bashrc export HADOOP_DEV_HOME=/home/m2/hadoop-0.23.1 export HADOOP_MAPRED_HOME=${HADOOP_DEV_HOME} export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME} export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME} export YARN_HOME=${HADOOP_DEV_HOME} export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/conf export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/conf export YARN_CONF_DIR=${HADOOP_DEV_HOME}/conf export HADOOP_LOG_DIR=${HADOOP_DEV_HOME}/logs
5、创建Hadoop配置文件
cd $HADOOP_DEV_HOME mkdir conf vim core-site.xml
core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/disk1/hadoop-0.23/tmp/</value> <description>A base for other temporary directories</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://A:9000</value> <description>The name of the default file system. Either the literal string "local" or a host:port for NDFS. </description> <final>true</final> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/disk12/hadoop-0.23/namenode</value> </property> <property> <name>dfs.federation.nameservices</name> <value>ns1,ns2</value> </property> <property> <name>dfs.namenode.rpc-address.ns1</name> <value>A:9000</value> </property> <property> <name>dfs.namenode.http-address.ns1</name> <value>A:23001</value> </property> <property> <name>dfs.namenode.secondary.http-address.ns1</name> <value>A:23002</value> </property> <property> <name>dfs.namenode.rpc-address.ns2</name> <value>B:9000</value> </property> <property> <name>dfs.namenode.http-address.ns2</name> <value>B:23001</value> </property> <property> <name>dfs.namenode.secondary.http-address.ns2</name> <value>B:23002</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>C:18040</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>C:18030</value> </property> <property> <description>The address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>dw95.kgb.sqa.cm4:18088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>C:18025</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>C:18141</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> </configuration>
slaves
A B C D
hadoop-env.sh
cp $HADOOP_DEV_HOME/share/hadoop/common/templates/conf/hadoop-env.sh $HADOOP_DEV_HOME/conf/ vim hadoop-env.sh export JAVA_HOME=
6、配置其他服务器
pscp slaves /home/m2/ -r /home/m2/
7、启动NameNode
ssh A ${HADOOP_DEV_HOME}/bin/hdfs namenode -format -clusterid test ssh B ${HADOOP_DEV_HOME}/bin/hdfs namenode -format -clusterid test ${HADOOP_DEV_HOME}/sbin/start-dfs.sh
8、启动ResourceManager
$HADOOP_DEV_HOME/sbin/start-yarn.sh
常见问题:
1)配置挂载多个本地目录,用逗号隔开
hdfs.xml
<property> <name>dfs.datanode.data.dir</name> <value>/disk1/hadoop-0.23/data,/disk2/hadoop-0.23/data</value> </property>
2)运行启动命令为出错,但实际上没有启动
可能为端口被占用
netstat -anp 端口号 #-n 某些常用端口号显示为名称,该参数强制显示实际端口号 #-p 显示占用该端口的进程 px -aux | grep 进程号 kill -9 进程号
3)运行DistributeShell出错
出错原因为启动ApplicationMaster时未设置正确的CLASSPATH
修改办法:修改client.java文件或者打https://issues.apache.org/jira/browse/MAPREDUCE-3869这个patch
- String classPathEnv = "${CLASSPATH}" - + ":./*" - + ":$HADOOP_CONF_DIR" - + ":$HADOOP_COMMON_HOME/share/hadoop/common/*" - + ":$HADOOP_COMMON_HOME/share/hadoop/common/lib/*" - + ":$HADOOP_HDFS_HOME/share/hadoop/hdfs/*" - + ":$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*" - + ":$YARN_HOME/modules/*" - + ":$YARN_HOME/lib/*" - + ":./log4j.properties:"; + StringBuilder classPathEnv = new StringBuilder("${CLASSPATH}:./*"); + for (String c : conf.get(YarnConfiguration.YARN_APPLICATION_CLASSPATH) + .split(",")) { + classPathEnv.append(':'); + classPathEnv.append(c.trim()); + } + classPathEnv.append(":./log4j.properties"); - // add the runtime classpath needed for tests to work + // add the runtime classpath needed for tests to work String testRuntimeClassPath = Client.getTestRuntimeClasspath(); - classPathEnv += ":" + testRuntimeClassPath; + classPathEnv.append(':'); + classPathEnv.append(testRuntimeClassPath); - env.put("CLASSPATH", classPathEnv); + env.put("CLASSPATH", classPathEnv.toString());