shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ ls
bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share
HDFS和YARN的配置文件,都存放在etc/hadoop目录下面,可以多各个文件进行配置:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ ls etc/hadoop/
capacity-scheduler.xml hadoop-metrics.properties httpfs-site.xml ssl-client.xml.example
configuration.xsl hadoop-policy.xml log4j.properties ssl-server.xml.example
container-executor.cfg hdfs-site.xml mapred-env.sh yarn-env.sh
core-site.xml httpfs-env.sh mapred-queues.xml.template yarn-site.xml
hadoop-env.sh httpfs-log4j.properties mapred-site.xml.template
hadoop-metrics2.properties httpfs-signature.secret slaves
bin目录下是用来管理HDFS、YARN的工具,如下所示:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ ls bin
container-executor hadoop hdfs mapred rcc test-container-executor yarn
下面,对Hadoop进行配置,Hadoop 2.x已经没有了mapred-site.xml配置文件,完全由yarn-site.xml取代。
2、HDFS安装配置
配置
etc/hadoop/core-site.xml文件内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<description></description>
</property>
</configuration>
配置
etc/hadoop/hdfs-site.xml
文件内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/shirdrn/storage/hadoop2/hdfs/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/shirdrn/storage/hadoop2/hdfs/data1,/home/shirdrn/storage/hadoop2/hdfs/data2,/home/shirdrn/storage/hadoop2/hdfs/data3</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/shirdrn/storage/hadoop2/hdfs/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
3、YARN安装配置
配置
etc/hadoop/yarn-site.xml文件内容:
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
<description>host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
<description>host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
<description>In case you do not want to use the default scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
<description>the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nodemanager/local</value>
<description>the local directories used by the nodemanager</description>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:8034</value>
<description>the nodemanagers bind to this port</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
<description>the amount of memory on the NodeManager in GB</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>${hadoop.tmp.dir}/nodemanager/remote</value>
<description>directory on hdfs where the application logs are moved to </description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>${hadoop.tmp.dir}/nodemanager/logs</value>
<description>the directories used by Nodemanagers as log directories</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
</configuration>
启动集群
首先,需要格式化HDFS,执行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ bin/hdfs namenode -format
如果格式化正常,日志中不会出现异常信息,可以继续启动集群相关服务。
启动HDFS集群,执行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ sbin/start-dfs.sh
可以在master结点上看到如下几个进程:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ jps
17238 Jps
16845 NameNode
17128 SecondaryNameNode
而在slave结点上看到如下进程:
shirdrn@slave01:~/programs$ jps
4865 Jps
4753 DataNode
shirdrn@slave02:~/programs$ jps
4867 DataNode
4971 Jps
如果配置完成以后,启动YARN集群非常容易,只需要执行几个脚本就可以。
启动ResourceManager,执行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ sbin/yarn-daemon.sh start resourcemanager
可以看到,多了一个ResourceManager进程:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ jps
16845 NameNode
17128 SecondaryNameNode
17490 Jps
17284 ResourceManager
然后,在slave结点上启动NodeManager进程,执行如下命令:
shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$ sbin/yarn-daemon.sh start nodemanager
shirdrn@slave02:~/programs/hadoop2/hadoop-2.0.4-alpha$ sbin/yarn-daemon.sh start nodemanager
这时通过jps命令可以看到,各个slave结点上又多了一个NodeManager进程:
shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$ jps
5544 DataNode
5735 NodeManager
5904 Jps
shirdrn@slave02:~/programs/hadoop2/hadoop-2.0.4-alpha$ jps
5544 DataNode
5735 NodeManager
5904 Jps
或者,可以查看启动对应进程的日志来确定是否启动成功:
shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$ tail -100f /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-resourcemanager-master.log
shirdrn@slave01:~/programs/hadoop2/hadoop-2.0.4-alpha$ tail -100f /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-nodemanager-master.log
另外,启动整个Hadoop集群(包括HDFS和YARN),也可以直接执行下面一个脚本,启动全部相关进程,如下所示:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-namenode-master.out
slave02: starting datanode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-datanode-slave02.out
slave01: starting datanode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-datanode-slave01.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/hadoop-shirdrn-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-resourcemanager-master.out
slave01: starting nodemanager, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-nodemanager-slave01.out
slave02: starting nodemanager, logging to /home/shirdrn/programs/hadoop2/hadoop-2.0.4-alpha/logs/yarn-shirdrn-nodemanager-slave02.out
验证集群
最后,验证集群计算,执行Hadoop自带的examples,执行如下命令:
shirdrn@master:~/cloud/hadoop2/hadoop-2.0.4-alpha$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.4-alpha.jar randomwriter out
参考链接
- http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
- http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html