1. 集群环境
操作系统: CentOS release 5.5 x86_64
IP 分配:
/etc/hosts
192.168.1.100 master 192.168.1.101 slave1 192.168.1.102 slave2 ……
2. 配置 SSH
ssh-keygen –t rsa –P '' –f /root/.ssh/id_rsa cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys scp /root/.ssh/id_rsa.pub root@slave1:/root/.ssh/authorized_keys scp /root/.ssh/id_rsa.pub root@slave2:/root/.ssh/authorized_keys ……
3. 下载 Hadoop
http://www.apache.org/dyn/closer.cgi/hadoop/core/
4. 配置 Hadoop(namenode)
$HADOOP_HOME/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> <description> The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. </description> </property> </configuration>
$HADOOP_HOME/conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> <description>The host and port that the MapReduce job tracker runsat.</description> </property> </configuration>
$HADOOP_HOME/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> <description> The actual number of replications can be specified when the file is created. </description> </property> </configuration>
$HADOOP_HOME/conf/masters
master
$HADOOP_HOME/conf/slaves
slave1 slave2 ……
5. 配置 Hadoop(datanode)
cd /usr/local/ tar –zcvf hadoop-0.21.0.tar.gz hadoop-0.21.0/ scp hadoop-0.21.0.tar.gz root@slave1:/usr/local/ scp hadoop-0.21.0.tar.gz root@slave2:/usr/local/ ……
6. 启动 Hadoop
sh $HADOOP_HOME/bin/hadoop namenode -format sh $HADOOP_HOME/bin/start-dfs.sh sh $HADOOP_HOME/bin/start-mapred.sh
查看运行状态
[root@master ~] jps 6584 Jps 5827 SecondaryNameNode 5618 NameNode 5938 JobTracker [root@slave1 ~] jps 3375 DataNode 3496 TaskTracker 3843 Jps [root@slave2 ~] jps 1838 DataNode 3160 Jps 1960 TaskTracker ……
7. 测试 Hadoop
生成一个测试文件
/root/test.txt
tom1 tom2 tom3 tom4 tom1 tom2 tom3 tom1 tom2 tom1
创建输入目录
hadoop fs -mkdir input
将测试文件上传至此目录
hadoop fs -put /root/test.txt input
查看是否上传成功
hadoop fs -ls input -rw-r--r-- 3 root supergroup 50 2010-12-22 02:08 /user/root/input/test.txt
调用单词统计程序并输出结果到指定目录
hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount input output …… 10/12/22 02:12:02 INFO mapreduce.Job: map 0% reduce 0% 10/12/22 02:12:09 INFO mapreduce.Job: map 100% reduce 0% 10/12/22 02:12:15 INFO mapreduce.Job: map 100% reduce 100% ……
查看结果文件
hadoop fs -ls output -rw-r--r-- 3 root supergroup 0 2010-12-22 02:12 /user/root/output/_SUCCESS -rw-r--r-- 3 root supergroup 28 2010-12-22 02:12 /user/root/output/part-r-00000 hadoop fs -cat output/part-r-00000 tom1 4 tom2 3 tom3 2 tom4 1
提取输出文件
hadoop fs –get output/oart-r-00000 /root/output.txt