hadoop2.5.0
【步骤】
1. 准备条件
(1)集群规划
主机类型 | IP地址 | 域名 |
master | 192.168.3.132 | hadoop01 |
slave1 | 192.168.3.134 | hadoop02 |
slave2 | 192.168.3.136 | hadoop03 |
slave3 | 192.168.3.138 | hadoop04 |
(2)以root身份登录操作系统
(3)在集群中的每台主机上执行如下命令,设置主机名。
hostname hadoop0*
编辑文件/etc/sysconfig/network如下
HOSTNAME= hadoop0*
(4)修改文件/etc/hosts如下
192.168.86.10 master.hadoop.com
192.168.86.11 slave1.hadoop.com
192.168.86.12 slave2.hadoop.com
192.168.86.13 slave3.hadoop.com
执行如下命令,将hosts文件复制到集群中每台主机上
scp /etc/hosts 192.168.50.*:/etc/hosts
(5)安装jdk
rpm -ivh jdk-7u67-linux-x64.rpm
创建文件
echo -e "JAVA_HOME=/usr/java/default\nexport PATH=\$JAVA_HOME/bin:\$PATH" > /etc/profile.d/java-env.sh
. /etc/profile.d/java-env.sh
(6)关闭iptables
service iptables stop
chkconfig iptables off
(7)关闭selinux。修改文件/etc/selinux/config,然后重启操作系统
SELINUX=disabled
2. 安装 (with YARN)
(1)在master.hadoop.com主机上执行
yum install hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-hdfs-namenode
yum install hadoop-hdfs-secondarynamenode 可选,如果使用HA,就不要安装此包
(2)在所有的slave*.hadoop.com主机上执行
yum install hadoop-yarn-nodemanager hadoop-mapreduce hadoop-hdfs-datanode
3. 配置。将以下文件修改完毕后,用scp命令复制到集群中的所有主机上
(1)创建配置文件
cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
(2)创建必要的本地文件夹
sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /var sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps sudo -u hdfs hadoop fs -mkdir -p /user sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
(3)修改配置文件
1)core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://master.hadoop.com:8020</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>720</value> </property> <property> <name>hadoop.proxyuser.mapred.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapred.hosts</name> <value>*</value> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value> </property>
2)hdfs-site.xml
<property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/1/dfs/nn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value> </property> <property> <name>dfs.datanode.failed.volumes.tolerated</name> <value>3</value> </property> <property> <name>dfs.datanode.fsdataset.volume.choosing.policy</name> <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value> </property> <property> <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name> <value>10737418240</value> </property> <property> <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name> <value>0.75</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.webhdfs.user.provider.user.pattern</name> <value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?$</value> </property>
3)yarn-site.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>master.hadoop.com</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <description>List of directories to store localized files in.</description> <name>yarn.nodemanager.local-dirs</name> <value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local,/data/4/yarn/local</value> </property> <property> <description>Where to store container logs.</description> <name>yarn.nodemanager.log-dirs</name> <value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs,/data/4/yarn/logs</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>hdfs://master.hadoop.com:8020/var/log/hadoop-yarn/apps</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* </value> </property> <property> <name>yarn.web-proxy.address</name> <value>master.hadoop.com</value> </property> <property> <description>It's not the memory the physical machine totally has, but that allocated to containers</description> <name>yarn.nodemanager.resource.memory-mb</name> <value>5120</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>10240</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>512</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx512m</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>10</value> </property> <property> <name>yarn.scheduler.increment-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.increment-allocation-vcores</name> <value>1</value> </property>
4)mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master.hadoop.com:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master.hadoop.com:19888</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user/history</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/user/history/intermediate-done-dir</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/user/history/done-dir</value> </property>
(4)复制配置文件到集群中的所有主机上
scp /etc/hadoop/conf.my_cluster/*-site.xml 192.168.50.*:/etc/hadoop/conf.my_cluster/
4. 格式化HDFS
sudo -u hdfs hdfs namenode -format
5. 启动HDFS
for x in `cd /etc/init.d ; ls hadoop-hdfs-*`; do service $x start; done
6. 在HDFS上创建必要的文件夹
sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /var sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps sudo -u hdfs hadoop fs -mkdir -p /user sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
7. 操作YARN
在集群中每台机器上执行如下命令:
(1)启动
service hadoop-yarn-resourcemanager start;service hadoop-mapreduce-historyserver start;service hadoop-yarn-proxyserver start;service hadoop-yarn-nodemanager start
(2)查看
service hadoop-yarn-resourcemanager status;service hadoop-mapreduce-historyserver status;service hadoop-yarn-proxyserver status;service hadoop-yarn-nodemanager status
(3)停止
service hadoop-yarn-resourcemanager stop;service hadoop-mapreduce-historyserver stop;service hadoop-yarn-proxyserver stop;service hadoop-yarn-nodemanager stop
(4)重启
service hadoop-yarn-resourcemanager restart;service hadoop-mapreduce-historyserver restart;service hadoop-yarn-proxyserver restart;service hadoop-yarn-nodemanager restart
8. 安装Hadoop客户端
(1)安装CentOS 6.5
(2)以root身份登录,执行以下命令:
rpm -ivh jdk-7u67-linux-x64.rpm yum install hadoop-client cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster scp 192.168.50.10:/etc/hadoop/conf.my_cluster/*-site.xml /etc/hadoop/conf.my_cluster/ scp 192.168.50.10:/etc/hosts /etc/ scp 192.168.50.10:/etc/profile.d/hadoop-env.sh /etc/profile.d/ . /etc/profile useradd -u 700 -g hadoop test passwd test <test用户密码>
9. 测试Hadoop with YARN
su - test #计算Pi hadoop fs -mkdir input hadoop fs -put /etc/hadoop/conf/*.xml input hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 2 100 #执行grep任务 hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output 'dfs[a-z.]+' hadoop fs -ls output hadoop fs -cat output/part-r-00000 | head