Hadoop分布式集群搭建(二 )完全分布式模式

1.两台主机

host1(NameNode, DataNode, SecondaryNameNode)

host2(DataNode)

vi /etc/hostname(给每一台主机指定主机名)

vi /hosts(给每一台主机指定主机名到ip地址的映射)


2.更改临时目录的权限

sudo chmod 777 /tmp


3.配置ssh

在上一节中,把公钥添加到自身的authorized_keys中,可以无密码登陆到自己的主机;现在需要把host1的公钥添加到其它机器的authorized_keys中,以实现host1无密码登陆到其它主机的功能,我们这里除了host1,就只有host2了,所以只把host1的公钥添加到host2的authorized_keys中:

ssh-copy-id -i .ssh/id_rsa.pub hadoop@host2

(还可以指定端口ssh-copy-id -i .ssh/id_rsa.pub "-p 22 hadoop@host2")

现在就可以通过

ssh host2

无密码从host1登陆到host2了。


4.下载Hadoop

我这里下载的是hadoop-2.4.0-x64.tar.gz,这是别人编译的一个64位包,因为我的系统是64位的,如果使用官网的32位Hadoop,会出现无法调用本地库的问题。下载下来之后,把软件进行解压就可以使用了,解压后文件夹名是:hadoop-2.4.0。


5.配置Hadoop运行参数

配置slaves(指定运行DataNode, TaskTracker的主机):

host1

host2

(不用配置master,Hadoop通过core-site.xml中的fs.defaultFS配置项来识别)


配置core-site.xml:

<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://host1:9000/</value>
                <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
        </property>
</configuration>
配置hdfs-site.xml:

<configuration>
        <property>
                <name>dfs.name.dir</name>
                <value>/home/hadoop/dfs/filesystem/name</value>
                <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
        </property>
        <property>
                <name>dfs.data.dir</name>
                <value>/home/hadoop/dfs/filesystem/data</value>
                <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
       </property>
       <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
</configuration>
配置mapred-site.xml

<configuration>
    <property>
 	 <name>mapred.job.tracker</name>
 	 <value>host1:9001</value>
  	<description>The host and port that the MapReduce job tracker runs
 	 at.  If "local", then jobs are run in-process as a single map
 	 and reduce task.
 	 </description>
    </property>
</configuration>

6.复制Hadoop文件到其它节点

scp -r /home/hadoop/hadoop-2.4.0 hadoop@host2:/home/hadoop/hadoop-2.4.0

到此,我们就可以格式化文件系统,然后启动集群了。

(参考:nutch相关框架视频教程)

你可能感兴趣的:(hadoop,集群,分布式)