Hadoop (0.21.0)分布式部署笔记

 

1.      集群环境

操作系统: CentOS release 5.5 x86_64

IP 分配:

/etc/hosts

 

 

192.168.1.100  master
192.168.1.101  slave1
192.168.1.102  slave2

……
 

2.      配置 SSH

 

 ssh-keygen –t rsa –P '' –f /root/.ssh/id_rsa
 cp  /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
 scp /root/.ssh/id_rsa.pub root@slave1:/root/.ssh/authorized_keys
 scp /root/.ssh/id_rsa.pub root@slave2:/root/.ssh/authorized_keys
 ……

 

3.      下载 Hadoop

 

http://www.apache.org/dyn/closer.cgi/hadoop/core/

 

4.      配置 Hadoop(namenode)

$HADOOP_HOME/conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:9000</value>
        <description>
            The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation.
        </description>
    </property>
</configuration>

$HADOOP_HOME/conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>master:9001</value>     
        <description>The host and port that the MapReduce job tracker runsat.</description>
   </property>
</configuration>

$HADOOP_HOME/conf/hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
            <name>dfs.replication</name>
            <value>3</value>       
            <description>
                  The actual number of replications can be specified when the file is created.
            </description>
      </property>
</configuration>

$HADOOP_HOME/conf/masters

master 

$HADOOP_HOME/conf/slaves

slave1
slave2
……
 

5.      配置 Hadoop(datanode)

 

 cd /usr/local/
 tar –zcvf hadoop-0.21.0.tar.gz hadoop-0.21.0/
 scp hadoop-0.21.0.tar.gz root@slave1:/usr/local/
 scp hadoop-0.21.0.tar.gz root@slave2:/usr/local/
……
 

 

6.      启动 Hadoop

 

sh $HADOOP_HOME/bin/hadoop namenode -format
sh $HADOOP_HOME/bin/start-dfs.sh
sh $HADOOP_HOME/bin/start-mapred.sh
 

       查看运行状态

 

[root@master ~] jps
6584 Jps
5827 SecondaryNameNode
5618 NameNode
5938 JobTracker

[root@slave1 ~] jps
3375 DataNode
3496 TaskTracker
3843 Jps

[root@slave2 ~] jps
1838 DataNode
3160 Jps
1960 TaskTracker

……
 

7.      测试 Hadoop

生成一个测试文件

/root/test.txt

tom1
tom2
tom3
tom4
tom1
tom2
tom3
tom1
tom2
tom1

创建输入目录

hadoop fs -mkdir input

将测试文件上传至此目录

hadoop fs -put /root/test.txt input

查看是否上传成功

hadoop fs -ls input

-rw-r--r--   3 root supergroup         50 2010-12-22 02:08 /user/root/input/test.txt

调用单词统计程序并输出结果到指定目录

hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount input output
……
10/12/22 02:12:02 INFO mapreduce.Job:  map 0% reduce 0%
10/12/22 02:12:09 INFO mapreduce.Job:  map 100% reduce 0%
10/12/22 02:12:15 INFO mapreduce.Job:  map 100% reduce 100%
……

查看结果文件

hadoop fs -ls output

-rw-r--r--   3 root supergroup          0 2010-12-22 02:12 /user/root/output/_SUCCESS
-rw-r--r--   3 root supergroup         28 2010-12-22 02:12 /user/root/output/part-r-00000

hadoop fs -cat output/part-r-00000

tom1       4
tom2       3
tom3       2
tom4       1

提取输出文件

hadoop fs –get output/oart-r-00000 /root/output.txt

你可能感兴趣的:(mapreduce,hadoop,xml,ssh,XSL)