Step 8:安装CDH5
a、下载rpm安装包
1、进入下载目录,/usr/tool/:
2、执行下载:
wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm --如果Linux版本是CentOS 5.x,则将红色字体部分改成5,下同
3、禁用GPG签名检查,并安装本地软件包:
yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
4、添加cloudera仓库验证:
rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
b、安装hadoop插件包
1、master上安装namenode、resourcemanager、nodemanager、datanode、mapreduce、historyserver、proxyserver和hadoop-client:
yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-namenode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce hadoop-mapreduce-historyserver hadoop-yarn-proxyserver -y
2、slave1和slave2上安装yarn、nodemanager、datanode、mapreduce和hadoop-client:
yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-yarn hadoop-hdfs-datanode hadoop-yarn-nodemanager hadoop-mapreduce -y
3、安装httpfs:
yum install hadoop-httpfs -y
4、安装Secondary NameNode(可选):
选择一台机器作为Secondary NameNode,安装SecondaryNamenode
yum install hadoop-hdfs-secondarynamenode -y
在/etc/hadoop/conf/hdfs-site.xml中添加以下配置:
<property> <name>dfs.namenode.checkpoint.check.period</name> <value>60</value> </property> <property> <name>dfs.namenode.checkpoint.txns</name> <value>1000000</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///data/cache1/dfs/namesecondary</value> </property> <property> <name>file:///data/cache1/dfs/namesecondary</name> <value>hdfs</value> </property> <property> <name>dfs.namenode.num.checkpoints.retained</name> <value>2</value> </property> <!-- 将slave1设置成SecondaryNameNode --> <property> <name>dfs.secondary.http.address</name> <value>slave1:50090</value> </property>
详细配置可参考:http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
如果要设置多个Secondary Namenode,可参考:http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
Step 9:创建目录
a、在master上创建目录:
mkdir -p /data/cache1/dfs/nn
chown -R hdfs:hadoop /data/cache1/dfs/nn
chmod 700 /data/cache1/dfs/nn
hdfs dfs -mkdir -p /user/hadoop/{done,tmp}
b、在slave1&slave2上创建目录:
mkdir -p /data/cache1/dfs/dn mkdir -p /data/cache1/dfs/mapred/local chown -R hdfs:hadoop /data/cache1/dfs/dn usermod -a -G mapred hadoop chown -R mapred:hadoop /data/cache1/dfs/mapred/local
c、在HDFS上创建:(此配置需在hadoop集群环境搭建完成并启动后执行)
hdfs dfs -mkdir -p /user/hadoop/{done,tmp} sudo -u hdfs hadoop fs -chown mapred:hadoop /user/hadoop/* hdfs dfs -mkdir -p /var/log/hadoop-yarn/apps sudo -u hdfs hadoop fs -chown hadoop:hdfs /var/log/hadoop-yarn/apps hdfs dfs -mkdir -p /user/hive/warehouse sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse hdfs dfs -mkdir /tmp/hive sudo -u hdfs hadoop fs -chmod 777 /tmp/hive
Step 10:配置环境变量
a、编辑/etc/profile,在里面添加如下环境变量:
export HADOOP_HOME=/usr/lib/hadoop export HIVE_HOME=/usr/lib/hive export HBASE_HOME=/usr/lib/hbase export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH
b、执行以下命令生效:
source /etc/profile
Step 11:修改hadoop配置文件:
a、配置文件说明:
配置文件 |
类型 |
说明 |
hadoop-env.sh |
Bash脚本 |
Hadoop运行环境变量设置 |
core-site.xml |
xml |
配置Hadoop core,如IO |
hdfs-site.xml |
xml |
配置HDFS守护进程:NN、JN、DN |
yarn-env.sh |
Bash脚本 |
Yarn运行环境变量设置 |
yarn-site.xml |
xml |
Yarn框架配置环境 |
mapred-site.xml |
xml |
MR属性设置 |
capacity-scheduler.xml |
xml |
Yarn调度属性设置 |
container-executor.cfg |
cfg |
Yarn Container配置 |
mapred-queues.xml |
xml |
MR队列设置 |
hadoop-metrics.properties |
Java属性 |
Hadoop Metrics配置 |
hadoop-metrics2.properties |
Java属性 |
Hadoop Metrics配置 |
slaves |
Plain Text |
DN节点配置 |
exclude |
Plain Text |
移除DN节点配置文件 |
log4j.properties |
Java属性 |
系统日志设置 |
configuration.xsl |
|
|
b、修改master机器上的配置文件,然后scp到各个slave的对应目录:
/etc/hadoop/conf/core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>master</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>hdfs</value> </property> <property> <name>hadoop.proxyuser.mapred.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>httpfs-host.foo.com</value> </property> <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property>
/etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>/data/cache1/dfs/nn/</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/cache1/dfs/dn/</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hdfs</value> </property>
/etc/hadoop/conf/mapred-site.xml
<property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <name>mapreduce.jobhistory.joblist.cache.size</name> <value>50000</value> </property> <!-- 前面在HDFS上创建的目录 --> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/user/hadoop/done</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/user/hadoop/tmp</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
/etc/hadoop/conf/yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>List of directories to store localized files in.</description> <name>yarn.nodemanager.local-dirs</name> <value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value> </property> <property> <description>Where to store container logs.</description> <name>yarn.nodemanager.log-dirs</name> <value>/var/log/hadoop-yarn/containers</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://master:9000/var/log/hadoop-yarn/apps</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $H0041DOOP_COMMON_HOME/*, $HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*, $HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*, $HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*, $HADOOP_YARN_HOME/lib/* </value> </property> <property> <name>yarn.web-proxy.address</name> <value>master:54315</value> </property>
c、添加所有的slave的 /etc/hadoop/slaves:
slave1
slave2
d、最后将以上修改的文件同步到slave上:
scp -r /etc/hadoop/conf root@slave1:/etc/hadoop/
scp -r /etc/hadoop/conf root@slave2:/etc/hadoop/
Step 12:开启回收站功能(可选)
在/etc/hadoop/conf/core-site.xml中添加如下两个参数:
1、fs.trash.interval:该参数值为时间间隔,单位为分钟,默认为0,表示回收站功能关闭。该值表示回收站中文件保存多长时间,如果服务端配置了该参数,则忽略客户端的配置;如果服务端关闭了该参数,则检查客户端是否有配置该参数;
2、fs.trash.checkpoint.interval:该参数值为时间间隔,单位为分钟,默认为0。该值表示检查回收站时间间隔,该值要小于fs.trash.interval,该值在服务端配置。如果该值设置为0,则使用 fs.trash.interval 的值。
Step 13:配置LZO(可选)
a、下载repo文件到traceMaster上的/etc/yum.repos.d/:
wget http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/cloudera-gplextras5.repo
b、安装LZO:yum install hadoop-lzo* impala-lzo -y
c、在/etc/hadoop/conf/core-site.xml中添加以下配置:
<property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
如果想要MapReduce在写中间结果时也使用LZO压缩,可以将以下配置添加到/etc/hadoop/conf/mapred-site.xml中:
<property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
d、配置完成后,进行测试:
hadoop jar /usr/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer hdfs://master:9000/user/hadoop/workflows/shellTest/workflow.xml
hdfs namenode -format /etc/init.d/hadoop-hdfs-namenode init /etc/init.d/hadoop-hdfs-namenode start /etc/init.d/hadoop-yarn-resourcemanager start /etc/init.d/hadoop-yarn-proxyserver start /etc/init.d/hadoop-mapreduce-historyserver start
Step 14:启动服务
a、master启动:
hdfs namenode -format /etc/init.d/hadoop-hdfs-namenode init /etc/init.d/hadoop-hdfs-namenode start /etc/init.d/hadoop-yarn-resourcemanager start /etc/init.d/hadoop-yarn-proxyserver start /etc/init.d/hadoop-mapreduce-historyserver start
b、slave1&slave2启动:
/etc/init.d/hadoop-hdfs-datanode start /etc/init.d/hadoop-yarn-nodemanager start
以上启动过程中,会遇到启动失败的问题,按照提示找到对应的log日志文件,进去查看错误详情,绝大多数是因为文件没有操作权限引起的,执行chmod –R 777 对应文件目录即可解决!
c、启动后检查:
http://192.168.157.130:50070 |
HDFS |
http://192.168.157.130:8088 |
ResourceManager(Yarn) |
http://192.168.157.130:8088/cluster/nodes |
在线的节点 |
http://192.168.157.130:8042 |
NodeManager |
http://192.168.157.131:8042 |
|
http://192.168.157.132:8042 |
|
http://192.168.157.130:19888/ |
JobHistory |