Hadoop 集群运行测试代码(Hadoop 权威指南天气数据示例)

今天将Hadoop 权威指南天气数据示例代码在hadoop集群上跑通,记录一下。

之前在百度/Google上怎么也没有找到怎么样将自己的Map-Reduce方法跑在集群上的每一步都具体描述,经过一番痛苦的无头苍蝇式的摸索,成功了,心情不错...

1准备天气预报数据(权威指南上的数据的简化版 5-9为year,15-19为temperature)

aaaaa1990aaaaaa0039a
bbbbb1991bbbbbb0040a
ccccc1992cccccc0040c
ddddd1993dddddd0043d
eeeee1994eeeeee0041e
aaaaa1990aaaaaa0031a
bbbbb1991bbbbbb0020a
ccccc1992cccccc0030c
ddddd1993dddddd0033d
eeeee1994eeeeee0031e
aaaaa1990aaaaaa0041a
bbbbb1991bbbbbb0040a
ccccc1992cccccc0040c
ddddd1993dddddd0043d
eeeee1994eeeeee0041e
aaaaa1990aaaaaa0044a
bbbbb1991bbbbbb0045a
ccccc1992cccccc0041c
ddddd1993dddddd0023d
eeeee1994eeeeee0041e

2 编写Map-Reduce函数和调度函数(Job)

简单点:如下

package hadoop.test;

import java.io.第;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class MaxTemperature {

static class MaxTemperatureMapper extends Mapper<LongWritable , Text , Text, IntWritable>
{
private static final int MISSING = 9999;

public void map(LongWritable key, Text value, Context conext) throws IOException, InterruptedException
{
String line = value.toString();
String year = line.substring(5, 9); //自己准备的数据,是天气预报数据的简化版
int airTemperature = Integer.parseInt(line.substring(15, 19)); //自己准备的数据,是天气预报数据的简化版

if(airTemperature != MISSING)
{
conext.write(new Text(year), new IntWritable(airTemperature));
}

}

}

static class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable>
{
public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException
{
int maxValue = Integer.MIN_VALUE;
for(IntWritable value : values)
{
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub

if(args.length != 2)
{
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}

try {
Job job = new Job();
job.setJarByClass(MaxTemperature.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);

} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}catch(ClassNotFoundException e)
{
e.printStackTrace();
}catch(InterruptedException e)
{
e.printStackTrace();
}


}

}

3 将第二步编写的代码打包成HadoopTest.jar放到本地某一个目录下,例如/home/hadoop/Documents/

然后export HADOOP_CLASSPATH=/home/hadoop/Documents/

(打包的时候要选择mainclass,不选择好像执行的时候有错误,eclipse的export选项中有MainClass选项

否则:运行hadoop jar 命令时在***.jar后面需要指定包括包路径的mainclass类名

例如 hadoop jar /home/hadoop/Documents/HadoopTest.jar hadoop.test.MaxTemperature /user/hadoop/temperature output

4将要分析的数据传到hdfs上去

hadoop dfs -put /home/hadoop/Documents/temperature ./temperature

5 开始执行

hadoop jar /home/hadoop/Documents/HadoopTest.jar /user/hadoop/temperature output

跟书上的命令不大一样,不过他那里是指的local的方式,另外不知道export HADOOP_CLASSPATH=/home/hadoop/Documents/有什么用,执行hadoop jar HadoopTest.jar /user/hadoop/temperature output是不行滴,具体为什么,继续探究吧,先这样了。

这里HadoopTest.jar在本地,要分析的数据文件temperature 在hdfs上,产生的输出在hdfs上,output是一个文件夹

hadoop@hadoop1:~$ hadoop dfs -cat ./output/part-r-00000
1990 44
1991 45
1992 41
1993 43
1994 41

你可能感兴趣的:(apache,eclipse,mapreduce,hadoop,百度)