hadoop cdh5安装(我是按这个配置安装成功的)

Hadoop-2.2.0集群安装配置实践

Hadoop 2.x和1.x已经大不相同了,应该说对于存储计算都更加通用了。Hadoop 2.x实现了用来管理集群资源的YARN框架,可以面向任何需要使用基于HDFS存储来计算的需要,当然MapReduce现在已经作为外围的插件式的计算框架,你可以根据需要开发或者选择合适的计算框架。目前,貌似对MapReduce支持还是比较好的,毕竟MapReduce框架已经还算成熟。其他一些基于YARN框架的标准也在开发中。
YARN框架的核心是资源的管理和分配调度,它比Hadoop 1.x中的资源分配的粒度更细了,也更加灵活了,它的前景应该不错。由于极大地灵活性,所以在使用过程中由于这些配置的灵活性,可能使用的难度也加大了一些。另外,我个人觉得,YARN毕竟还在发展之中,也有很多不成熟的地方,各种问题频频出现,资料也相对较少,官方文档有时更新也不是很及时,如果我选择做海量数据处理,可能YARN还不能满足生产环境的需要。如果完全使用MapReduce来做计算,还是选择相对更加成熟的Hadoop 1.x版本用于生产环境。
下面使用4台机器,操作系统为CentOS 6.4 64位,一台做主节点,另外三台做从节点,实践集群的安装配置。

主机配置规划

修改/etc/hosts文件,增加如下地址映射:

1 10.95.3.48     m1
2 10.95.3.54     s1
3 10.95.3.59     s2
4 10.95.3.66     s3

每台机器配置对应的hostname,修改/etc/sysconfig/network文件,例如s1节点内容配置为:

1 NETWORKING=yes
2 HOSTNAME=s1

m1为集群主节点,s1、s2、s3为集群从节点。
关于主机资源的配置,我们这里面使用VMWare工具,创建了4个虚拟机,具体置情况如下所示:

  • 一个主节点有1个核(core)
  • 一个主节点内存1G
  • 每个从节点有1个核(core)
  • 每个从节点内存2G

目录规划

Hadoop程序存放目录为/home/shirdrn/cloud/programs/hadoop-2.2.0,相关的数据目录,包括日志、存储等指定为/home/shirdrn/cloud/storage/hadoop-2.2.0。将程序和数据目录分开,可以更加方便的进行配置的同步。
具体目录的准备与配置如下所示:

  • 在每个节点上创建程序存储目录/home/shirdrn/cloud/programs/hadoop-2.2.0,用来存放Hadoop程序文件
  • 在每个节点上创建数据存储目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs,用来存放集群数据
  • 在主节点m1上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/name,用来存放文件系统元数据
  • 在每个从节点上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/data,用来存放真正的数据
  • 所有节点上的日志目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/logs
  • 所有节点上的临时目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp

下面配置涉及到的目录,都参照这里的目录规划。

环境变量配置

首先,使用Sun的JDK,修改~/.bashrc文件,配置如下:

1 export JAVA_HOME=/usr/java/jdk1.6.0_45/
2 export PATH=$PATH:$JAVA_HOME/bin
3 export CLASSPATH=$JAVA_HOME/lib/*.jar:$JAVA_HOME/jre/lib/*.jar

然后配置Hadoop安装目录,相关环境变量:

1 export HADOOP_HOME=/home/shirdrn/cloud/programs/hadoop-2.2.0
2 export PATH=$PATH:$HADOOP_HOME/bin
3 export PATH=$PATH:$HADOOP_HOME/sbin
4 export HADOOP_LOG_DIR=/home/shirdrn/cloud/storage/hadoop-2.2.0/logs
5 export YARN_LOG_DIR=$HADOOP_LOG_DIR

免密码登录配置

在每各节点上,执行如下命令:

1 ssh-keygen

然后点击回车一直下去即可。
在主节点m1上,执行命令:

1 ssh m1

保证不需要密码即可登录本机m1节点。
将m1的公钥,添加到s1、s2、s3的~/.ssh/authorized_keys文件中,并且需要查看~/.ssh/authorized_keys的权限,不能对同组用户具有写权限,如果有,则执行下面命令:

1 chmod g-w ~/.ssh/authorized_keys

这时,在m1节点上,应该保证执行如下命令不需要输入密码:

1 ssh s1
2 ssh s2
3 ssh s3

Hadoop配置文件

配置文件所在目录为/home/shirdrn/programs/hadoop-2.2.0/etc/hadoop,可以修改对应的配置文件。

  • 配置文件core-site.xml内容
01 xml version="1.0" encoding="UTF-8"?>
02 xml-stylesheet type="text/xsl" href="configuration.xsl"?>
03  
04 <configuration>
05         <property>
06                 <name>fs.defaultFSname>
07                 <value>hdfs://m1:9000/value>
08                 <description>The name of the default file system. A URI whose scheme
09                         and authority determine the FileSystem implementation. The uri's
10                         scheme determines the config property (fs.SCHEME.impl) naming the
11                         FileSystem implementation class. The uri's authority is used to
12                         determine the host, port, etc. for a filesystem.description>
13         property>
14         <property>
15                 <name>dfs.replicationname>
16                 <value>3value>
17         property>
18         <property>
19                 <name>hadoop.tmp.dirname>
20                 <value>/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp/hadoop-${user.name}value>
21                 <description>A base for other temporary directories.description>
22         property>
23 configuration>
  • 配置文件hdfs-site.xml内容
01 xml version="1.0" encoding="UTF-8"?>
02 xml-stylesheet type="text/xsl" href="configuration.xsl"?>
03  
04 <configuration>
05         <property>
06                 <name>dfs.namenode.name.dirname>
07                 <value>/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/namevalue>
08                 <description>Path on the local filesystem where the NameNode stores
09                         the namespace and transactions logs persistently.description>
10         property>
11         <property>
12                 <name>dfs.datanode.data.dirname>
13                 <value>/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/datavalue>
14                 <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.description>
15         property>
16         <property>
17                 <name>dfs.permissionsname>
18                 <value>falsevalue>
19         property>
20 configuration>
  • 配置文件yarn-site.xml内容
01 xml version="1.0"?>
02  
03 <configuration>
04         <property>
05                 <name>yarn.resourcemanager.resource-tracker.addressname>
06                 <value>m1:8031value>
07                 <description>host is the hostname of the resource manager and
08                         port is the port on which the NodeManagers contact the Resource Manager.
09                 description>
10         property>
11         <property>
12                 <name>yarn.resourcemanager.scheduler.addressname>
13                 <value>m1:8030value>
14                 <description>host is the hostname of the resourcemanager and port is
15                         the port
16                         on which the Applications in the cluster talk to the Resource Manager.
17                 description>
18         property>
19         <property>
20                 <name>yarn.resourcemanager.scheduler.classname>
21                 <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulervalue>
22                 <description>In case you do not want to use the default schedulerdescription>
23         property>
24         <property>
25                 <name>yarn.resourcemanager.addressname>
26                 <value>m1:8032value>
27                 <description>the host is the hostname of the ResourceManager and the
28                         port is the port on
29                         which the clients can talk to the Resource Manager.
30                 description>
31         property>
32         <property>
33                 <name>yarn.nodemanager.local-dirsname>
34                 <value>${hadoop.tmp.dir}/nodemanager/localvalue>
35                 <description>the local directories used by the nodemanagerdescription>
36         property>
37         <property>
38                 <name>yarn.nodemanager.addressname>
39                 <value>0.0.0.0:8034value>
40                 <description>the nodemanagers bind to this portdescription>
41         property>
42         <property>
43                 <name>yarn.nodemanager.resource.cpu-vcoresname>
44                 <value>1value>
45                 <description>description>
46         property>
47         <property>
48                 <name>yarn.nodemanager.resource.memory-mbname>
49                 <value>2048value>
50                 <description>Defines total available resources on the NodeManager to be made available to running containersdescription>
51         property>
52         <property>
53                 <name>yarn.nodemanager.remote-app-log-dirname>
54                 <value>${hadoop.tmp.dir}/nodemanager/remotevalue>
55                 <description>directory on hdfs where the application logs are moved to description>
56         property>
57         <property>
58                 <name>yarn.nodemanager.log-dirsname>
59                 <value>${hadoop.tmp.dir}/nodemanager/logsvalue>
60                 <description>the directories used by Nodemanagers as log directoriesdescription>
61         property>
62         <property>
63                 <name>yarn.application.classpathname>
64                 <value>$HADOOP_HOME,$HADOOP_HOME/share/hadoop/common/*,
65                $HADOOP_HOME/share/hadoop/common/lib/*,
66                $HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*,
67                $HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*,
68                $HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*value>
69                 <description>Classpath for typical applications.description>
70         property>
                   
72         <property>
73                 <name>yarn.nodemanager.aux-servicesname>
74                 <value>mapreduce_shufflevalue>
75                 <description>shuffle service that needs to be set for Map Reduce to run description>
76         property>
77      <property>
78             <name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname>
79             <value>org.apache.hadoop.mapred.ShuffleHandlervalue>
80      property>
81      <property>
82             <name>yarn.scheduler.minimum-allocation-mbname>
83             <value>256value>
84      property>
85      <property>
86             <name>yarn.scheduler.maximum-allocation-mbname>
87             <value>6144value>
88      property>
89      <property>
90             <name>yarn.scheduler.minimum-allocation-vcoresname>
91             <value>1value>

你可能感兴趣的:(hadoop,hadoop,cdh5安装)