Hadoop作为一个开源的分布式计算框架,现在在国内是越来越火了。包括淘宝、百度等大公司,都在大批量的使用hadoop。之前对hadoop稍微了解了一点,今天把Hadoop的分布式安装大致介绍一下。
更加详细的安装帮助及参数配置信息,请大家访问英文官方网 http://hadoop.apache.org/common/docs/r0.21.0/cluster_setup.html
- 安装前的准备工作
- Java JDK 1.6.x
- ssh 以及 sshd。保证各个服务器之前的免密码访问
- Hadoop下载:http://labs.renren.com/apache-mirror//hadoop/core/hadoop-0.21.0/hadoop-0.21.0.tar.gz
- 这个包需要在hadoop集群的所有服务器上下载并解压缩
- 我们计划将整个集群安装在4台服务器上。服务器名字分别为myna[5-8].其中myna5为Namenode, myna6为TaskTracker,myna[5-6]即为master,其余两台myna[7-8]为slaver。
- 配置这四台服务器的环境变量
- 配置myna5(NameNode)和myna6(TaskTracker):
- ~/hadoop-0.21.0/conf/core-site.xml:
- ~/hadoop-0.21.0/conf/hdfs-site.xml:
- ~/hadoop-0.21.0/conf/mapred-site.xml:
- 只需要配置myna5的masters
- 配置myna5和myna6的slavers
- 配置myna7和myna8(Slavers)
- ~/hadoop-0.21.0/conf/core-site.xml:
- ~/hadoop-0.21.0/conf/hdfs-site.xml:
- ~/hadoop-0.21.0/conf/mapred-site.xml:
- 编辑myna[5-8]的~/hadoop-0.21.0/conf/hadoop-env.sh,,添加JAVA_HOME环境变量
- 至此,Hadoop cluster配置完成。启动Hadoop:
- 在myna5上启动HDFS
- 在myna6上启动MapRed
- 启动HDFS后,你可以通过前端页面访问:
- 访问HDFS:http://myna5:50070
- 访问MapRed:http://myna6:50030
export JAVA_HOME=/path/to/bin/java
export HADOOP_HOME=~/hadoop-0.21.0
export HADOOP_HOME=~/hadoop-0.21.0
<
property
>
< name >fs.default.name name >
< value >hdfs://myna5:54320 value >
< description >The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem. description >
property >
< name >fs.default.name name >
< value >hdfs://myna5:54320 value >
< description >The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem. description >
property >
<
property
>
< name >hadoop.tmp.dir name >
< value >/home/hrj/hadooptmp/hadoop-${user.name} value >
< description >A base for other temporary directories. description >
property >
< property >
< name >dfs.upgrade.permission name >
< value >777 value >
property >
< property >
< name >dfs.umask name >
< value >022 value >
property>
< name >hadoop.tmp.dir name >
< value >/home/hrj/hadooptmp/hadoop-${user.name} value >
< description >A base for other temporary directories. description >
property >
< property >
< name >dfs.upgrade.permission name >
< value >777 value >
property >
< property >
< name >dfs.umask name >
< value >022 value >
property>
<
property
>
< name >mapred.job.tracker name >
< value >myna6:54321 value >
< description >The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
description >
property >
< property >
< name >mapred.compress.map.output name >
< value >true value >
< description >Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
description >
property >
< property >
< name >mapred.child.java.opts name >
< value >-Xmx1024m value >
property >
< name >mapred.job.tracker name >
< value >myna6:54321 value >
< description >The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
description >
property >
< property >
< name >mapred.compress.map.output name >
< value >true value >
< description >Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
description >
property >
< property >
< name >mapred.child.java.opts name >
< value >-Xmx1024m value >
property >
myna6
myna7
myna8
myna8
<
property
>
< name >fs.default.name name >
< value >hdfs://myna5:54320 value >
< description >The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem. description >
property >
< name >fs.default.name name >
< value >hdfs://myna5:54320 value >
< description >The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem. description >
property >
<
property
>
< name >hadoop.tmp.dir name >
< value >/disk/hadooptmp/hadoop-${user.name} value >
property >
< property >
< name >dfs.data.dir name >
< value >/home/hadoopdata,/disk/hadoopdata value >
property >
< property >
< name >dfs.upgrade.permission name >
< value >777 value >
property >
< property >
< name >dfs.umask name >
< value >022 value >
property>
< name >hadoop.tmp.dir name >
< value >/disk/hadooptmp/hadoop-${user.name} value >
property >
< property >
< name >dfs.data.dir name >
< value >/home/hadoopdata,/disk/hadoopdata value >
property >
< property >
< name >dfs.upgrade.permission name >
< value >777 value >
property >
< property >
< name >dfs.umask name >
< value >022 value >
property>
<
property
>
< name >mapred.job.tracker name >
< value >myna6:54321 value >
< description >The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
description >
property >
< property >
< name >mapred.compress.map.output name >
< value >true value >
< description >Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
description >
property >
< property >
< name >mapred.child.java.opts name >
< value >-Xmx1024m value >
property >
< property >
< name >mapred.tasktracker.map.tasks.maximum name >
< value >4 value >
property >
< property >
< name >mapred.tasktracker.reduce.tasks.maximum name >
< value >2 value >
property >
< name >mapred.job.tracker name >
< value >myna6:54321 value >
< description >The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
description >
property >
< property >
< name >mapred.compress.map.output name >
< value >true value >
< description >Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
description >
property >
< property >
< name >mapred.child.java.opts name >
< value >-Xmx1024m value >
property >
< property >
< name >mapred.tasktracker.map.tasks.maximum name >
< value >4 value >
property >
< property >
< name >mapred.tasktracker.reduce.tasks.maximum name >
< value >2 value >
property >
export JAVA_HOME=/path/to/bin/java
hrj$ ~/hadoop-0.21.0/bin/hadoop namenode -format
hrj$ ~/hadoop-0.21.0/bin/start-dfs.sh
hrj$ ~/hadoop-0.21.0/bin/start-mapred.sh
更加详细的安装帮助及参数配置信息,请大家访问英文官方网 http://hadoop.apache.org/common/docs/r0.21.0/cluster_setup.html