root@nodeA:~# sudo adduser zyx
Adding user `zyx' ...
Adding new group `zyx' (1001) ...
Adding new user `zyx' (1001) with group `zyx' ...
Creating home directory `/home/zyx' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for zyx
Enter the new value, or press ENTER for the default
Full Name []: ^Cadduser: `/usr/bin/chfn zyx' exited from signal 2. Exiting.
root@nodeA:~#
root@nodeA:~# sudo usermod -G admin -a zyx
root@nodeA:~#
(1)namenode上实现无密码登陆本机
zyx@nodeA:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/zyx/.ssh'.
Your identification has been saved in /home/zyx/.ssh/id_dsa.
Your public key has been saved in /home/zyx/.ssh/id_dsa.pub.
The key fingerprint is:
65:2e:e0:df:2e:61:a5:19:6a:ab:0e:38:45:a9:6a:2b zyx@nodeA
The key's randomart image is:
+--[ DSA 1024]----+
| |
| . |
| o . o |
| o . ..+. |
|. . ..S=. |
|.o o.=o |
|+.. . o... |
|E... . .. |
|.. .o. .. |
+-----------------+
zyx@nodeA:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
zyx@nodeA:~$
(2)实现namenode无密码登陆其他datanode
hadoop@nodeB:~$ scp hadoop@nodea:/home/hadoop/.ssh/id_dsa.pub /home/hadoop
hadoop@nodea's password:
id_dsa.pub 100% 602 0.6KB/s 00:00
hadoop@nodeB:~$ cat id_dsa.pub >> .ssh/authorized_keys
hadoop@nodeB:~$ sudo ufw disable
利用F-Secure SSH File Transfer Trial 工具,直接拖拽
(1)安装
zyx@nodeA:~$ ls
Examples jdk
zyx@nodeA:~$ cd jdk
zyx@nodeA:~/jdk$ ls
jdk-6u20-linux-i586.bin
zyx@nodeA:~/jdk$ chmod a+x jdk*
zyx@nodeA:~/jdk$ ./jdk*
接下来显示许可协议,然后选择yes, 然后按Enter键,安装结束。
zyx@nodeA:~/jdk$ ls
jdk1.6.0_20 jdk-6u20-linux-i586.bin
(2)配置
用root@nodeA:/home/zyx# vi .bashrc 打开bashrc, 然后在最后加入下面几行:
export JAVA_HOME=/home/zyx/jdk/jdk1.6.0_20
export JRE_HOME=/home/zyx/jdk/jdk1.6.0_20/jre
export CLASS_PATH=$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOMR/bin
下载地址:
http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
把hadoop-0.20.2.tar.gz 放到home/zyx/hadoop下,然后解压该文件
zyx@nodeB:~/hadoop$ tar -zvxf hadoop-0.20.2.tar.gz
设置环境变量, 添加到home/zyx/.bashrc
zyx@nodeA:~$ vi .bashrc
export HADOOP_HOME=/home/zyx/hadoop/hadoop-0.20.2
export PATH=$HADOOP_HOME/bin:$PATH
export JAVA_HOME=/home/zyx/jdk/jdk/jdk1.6.0_20
zyx@nodeC:~/hadoop-0.20.2/conf$ more core-site.xml
#
zyx@nodeC:~/hadoop-0.20.2/conf$ more hdfs-site.xml
zyx@nodeC:~/hadoop-0.20.2/conf$ more mapred-site.xml
#
(0) 格式化:
zyx@nodeC:~/hadoop-0.20.2/bin$ hadoop namenode –format
(1)用jps查看进程:
zyx@nodeC:~/hadoop-0.20.2/bin$ jps
31030 NameNode
31488 TaskTracker
31283 SecondaryNameNode
31372 JobTracker
31145 DataNode
31599 Jps
(2)查看集群状态
zyx@nodeC:~/hadoop-0.20.2/bin$ hadoop dfsadmin -report
Configured Capacity: 304716488704 (283.79 GB)
Present Capacity: 270065557519 (251.52 GB)
DFS Remaining: 270065532928 (251.52 GB)
DFS Used: 24591 (24.01 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 192.168.1.103:50010
Decommission Status : Normal
Configured Capacity: 304716488704 (283.79 GB)
DFS Used: 24591 (24.01 KB)
Non DFS Used: 34650931185 (32.27 GB)
DFS Remaining: 270065532928(251.52 GB)
DFS Used%: 0%
DFS Remaining%: 88.63%
Last contact: Fri Apr 23 15:39:10 CST 2010
(3)Stop 文件:
zyx@nodeC:~/hadoop-0.20.2/bin$ stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
zyx@nodeC:~$ hadoop dfs -copyFromLocal /home/zyx/file0* input
zyx@nodeC:~$ hadoop dfs -ls
Found 1 items
drwxr-xr-x - zyx supergroup 0 2010-04-23 16:40 /user/zyx/input
zyx@nodeC:~$ hadoop dfs -ls input
Found 2 items
-rw-r--r-- 1 zyx supergroup 0 2010-04-23 16:40 /user/zyx/input/file01
-rw-r--r-- 1 zyx supergroup 0 2010-04-23 16:40 /user/zyx/input/file02
zyx@nodeC:~/hadoop-0.20.2$ hadoop jar hadoop-0.20.2-examples.jar wordcount input output
10/04/24 09:25:10 INFO input.FileInputFormat: Total input paths to process : 2
10/04/24 09:25:11 INFO mapred.JobClient: Running job: job_201004240840_0001
10/04/24 09:25:12 INFO mapred.JobClient: map 0% reduce 0%
10/04/24 09:25:22 INFO mapred.JobClient: map 100% reduce 0%
10/04/24 09:25:34 INFO mapred.JobClient: map 100% reduce 100%
10/04/24 09:25:36 INFO mapred.JobClient: Job complete: job_201004240840_0001
10/04/24 09:25:36 INFO mapred.JobClient: Counters: 17
10/04/24 09:25:36 INFO mapred.JobClient: Job Counters
10/04/24 09:25:36 INFO mapred.JobClient: Launched reduce tasks=1
10/04/24 09:25:36 INFO mapred.JobClient: Launched map tasks=2
10/04/24 09:25:36 INFO mapred.JobClient: Data-local map tasks=2
10/04/24 09:25:36 INFO mapred.JobClient: FileSystemCounters
10/04/24 09:25:36 INFO mapred.JobClient: FILE_BYTES_READ=79
10/04/24 09:25:36 INFO mapred.JobClient: HDFS_BYTES_READ=50
10/04/24 09:25:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=228
10/04/24 09:25:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41
10/04/24 09:25:36 INFO mapred.JobClient: Map-Reduce Framework
10/04/24 09:25:36 INFO mapred.JobClient: Reduce input groups=5
10/04/24 09:25:36 INFO mapred.JobClient: Combine output records=6
10/04/24 09:25:36 INFO mapred.JobClient: Map input records=2
10/04/24 09:25:36 INFO mapred.JobClient: Reduce shuffle bytes=85
10/04/24 09:25:36 INFO mapred.JobClient: Reduce output records=5
10/04/24 09:25:36 INFO mapred.JobClient: Spilled Records=12
10/04/24 09:25:36 INFO mapred.JobClient: Map output bytes=82
10/04/24 09:25:36 INFO mapred.JobClient: Combine input records=8
10/04/24 09:25:36 INFO mapred.JobClient: Map output records=8
10/04/24 09:25:36 INFO mapred.JobClient: Reduce input records=6
zyx@nodeC:~/hadoop-0.20.2$ hadoop fs -cat output/part-r-00000
Goodbye 1
Hadoop 2
Hello 2
bye 1
cuijj 2
11. 对于.java程序的hadoop编译:
root@nodeC:/home/zyx/hadoop-0.20.2# javac -classpath /home/zyx/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/zyx/hadoop-0.20.2/lib/commons-cli-1.2.jar -d /home/zyx/wordcount_class /home/zyx/hadoop-0.20.2/src/examples/org/apache/hadoop/examples/WordCount.java
12. 把 .class 文件生成 .jar 文件
root@nodeC:/home/zyx/wordcount_class/org/apache/hadoop/examples# jar -cvf /home/zyx/wordcount.jar /home/zyx/wordcount_class/ .
added manifest
adding: home/zyx/wordcount_class/(in = 0) (out= 0)(stored 0%)
adding: home/zyx/wordcount_class/org/(in = 0) (out= 0)(stored 0%)
adding: home/zyx/wordcount_class/org/apache/(in = 0) (out= 0)(stored 0%)
adding: home/zyx/wordcount_class/org/apache/hadoop/(in = 0) (out= 0)(stored 0%)
adding: home/zyx/wordcount_class/org/apache/hadoop/examples/(in = 0) (out= 0)(stored 0%)
adding: home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount.class(in = 1911) (out= 996)(deflated 47%)
adding: home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$TokenizerMapper.class(in = 1790) (out= 765)(deflated 57%)
adding: home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$IntSumReducer.class(in = 1789) (out= 746)(deflated 58%)
adding: WordCount.class(in = 1911) (out= 996)(deflated 47%)
adding: WordCount$TokenizerMapper.class(in = 1790) (out= 765)(deflated 57%)
在深入细节之前,让我们先看一个Map/Reduce的应用示例,以便对它们的工作方式有一个初步的认识。
WordCount是一个简单的应用,它可以计算出指定数据集中每一个单词出现的次数。
这个应用适用于 单 机模式, 伪 分布式模式 或 完 全分布式模式 三种Hadoop安装方式。
WordCount.java |
|
1. |
package org.myorg; |
2. |
|
3. |
import java.io.IOException; |
4. |
import java.util.*; |
5. |
|
6. |
import org.apache.hadoop.fs.Path; |
7. |
import org.apache.hadoop.conf.*; |
8. |
import org.apache.hadoop.io.*; |
9. |
import org.apache.hadoop.mapred.*; |
10. |
import org.apache.hadoop.util.*; |
11. |
|
12. |
public class WordCount { |
13. |
|
14. |
public static class Map extends MapReduceBase implements Mapper |
15. |
private final static IntWritable one = new IntWritable(1); |
16. |
private Text word = new Text(); |
17. |
|
18. |
public void map(LongWritable key, Text value, OutputCollector |
19. |
String line = value.toString(); |
20. |
StringTokenizer tokenizer = new StringTokenizer(line); |
21. |
while (tokenizer.hasMoreTokens()) { |
22. |
word.set(tokenizer.nextToken()); |
23. |
output.collect(word, one); |
24. |
} |
25. |
} |
26. |
} |
27. |
|
28. |
public static class Reduce extends MapReduceBase implements Reducer |
29. |
public void reduce(Text key, Iterator |
30. |
int sum = 0; |
31. |
while (values.hasNext()) { |
32. |
sum += values.next().get(); |
33. |
} |
34. |
output.collect(key, new IntWritable(sum)); |
35. |
} |
36. |
} |
37. |
|
38. |
public static void main(String[] args) throws Exception { |
39. |
JobConf conf = new JobConf(WordCount.class); |
40. |
conf.setJobName("wordcount"); |
41. |
|
42. |
conf.setOutputKeyClass(Text.class); |
43. |
conf.setOutputValueClass(IntWritable.class); |
44. |
|
45. |
conf.setMapperClass(Map.class); |
46. |
conf.setCombinerClass(Reduce.class); |
47. |
conf.setReducerClass(Reduce.class); |
48. |
|
49. |
conf.setInputFormat(TextInputFormat.class); |
50. |
conf.setOutputFormat(TextOutputFormat.class); |
51. |
|
52. |
FileInputFormat.setInputPaths(conf, new Path(args[0])); |
53. |
FileOutputFormat.setOutputPath(conf, new Path(args[1])); |
54. |
|
55. |
JobClient.runJob(conf); |
57. |
} |
58. |
} |
59. |
假设环境变量HADOOP_HOME对应安装时的根目录,HADOOP_VERSION对应Hadoop的当前安装版本,编译WordCount.java来创建jar包,可如下操作:
$ mkdir wordcount_classes
$ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java
$ jar -cvf /usr/joe/wordcount.jar -C wordcount_classes/ .
假设:
用示例文本文件做为输入:
$ bin/hadoop dfs -ls /usr/joe/wordcount/input/
/usr/joe/wordcount/input/file01
/usr/joe/wordcount/input/file02
$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file01
Hello World Bye World
$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file02
Hello Hadoop Goodbye Hadoop
运行应用程序:
$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input /usr/joe/wordcount/output
输出是:
$ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
应用程序能够使用-files选项来指定一个由逗号分隔的路径列表,这些路径是 task的当前工作目录。使用选项-libjars可以向map和reduce的 classpath中添加jar包。使用-archives选项程序可以传递档案文件做为 参数,这些档案文件会被解压并且在task的当前工作目录下会创建一个指向解压生成的目录的符号链接(以压缩包的名字命名)。 有关命令行选项的更多细节请参考 Commands manual。
使用-libjars和-files运 行wordcount例子:
hadoop jar hadoop-examples.jar wordcount -files cachefile.txt -libjars mylib.jar input output