Hadoop MapReduce 初级入门算法总结

Hadoop MapReduce 初级入门算法总结:

 

前提知识点:

1、掌握Hadoop HDFS 文件系统(文件上传、下载等基本操作)

2、理解Hadoop MapReduce MapRedcue的原理及过程

3、搭建ecliplseHadoop开发环境

4、搭建虚拟机的Hadoop 系统

 

算法:

139课:MapReduce分析气象数据动手编程实战:案例分析、编码实战、最佳实践

【数据文件内容】

0067011990999991950051507004888888889999999N9+00001+9999999999999999999999

0067011990999991950051507004888888889999999N9+00001+9999999999999999999999

0067011990999991950051512004888888889999999N9+00221+9999999999999999999999

0067011990999991950051518004888888889999999N9-00111+9999999999999999999999

0067011990999991949032412004888888889999999N9+01111+9999999999999999999999

0067011990999991950032418004888888880500001N9+00001+9999999999999999999999

0067011990999991950051507004888888880500001N9+00781+9999999999999999999999

15-19个字符表示year,例如1950年、1949年等;

45-50个字符表示的是温度,例如-00111+00001

50位只能是01459等几个数字;

【设计思路】

1、   提供一个气象的日志文件,我们怎么去设计Key/Value呢?如何进行数据清洗,拿到我们需要的数据:年份和温度?

 

MAP阶段:

l   每行记录0067011990999991950051507004888888889999999N9+00001+9999999999999999999999

l  MAP阶段,hadoop Mapreduce框架自动将上述的每行记录赋值给mapvalue值,(key值为每行记录偏移量)

l  我们将value值赋值给自己定义的变量data,就可以通过字符串截取出第15-19个字符的year45-50个字符的是温度,以及清洗掉不符合规则的记录。

if (temperature !=MiSSING &&validDataFlag.matches("[01459]")){ //正则表达式,清洗数据
context.write(new Text(year), newIntWritable(temperature)); //返回集合key value 同年的温度集合
}

l  MAP阶段输出的KEY值为year;输出的value值为温度。

 

Reduce 阶段:

l   Reduce 阶段,上述Map汇总过来的每年的温度值value是一个集合,例如一年每天合计有365个温度,因此Reduce 读入的value类型Iterable<IntWritable>key是年份

l  对于value的温度集合遍历,找出最小值的温度。

for(IntWritableitem:data){
coldestTemperature=Math.min(coldestTemperature, item.get());
} //
遍历温度集合,比较温度大小

l  Reduce 阶段汇总输出的KEY值是为year;输出的value值为最小值的温度(即最低温度)。

【运行结果】

1949 111
1950 -11

 

2

 

 

 

 

 

 

 

 

 

 

TemperatureComputation关键代码】

 

public static classTeperatureMapper extends Mapper <LongWritable,Text,Text,IntWritable>{

                   privatestatic final int MiSSING =9999;

                   @Override

                   protectedvoid map(LongWritable key, Text value, Mapper<LongWritable, Text, Text,IntWritable>.Context context)

                                     throwsIOException, InterruptedException {

                            //TODO Auto-generated method stub

                            //super.map(key,value, context);

                           

                            Stringdata = value.toString();

                            Stringyear =data.substring(15,19);

                            inttemperature = 0;

                            if('+' == data.charAt(45)){

                                     temperature=Integer.parseInt(data.substring(46,50));

                            }else {

                                     temperature=Integer.parseInt(data.substring(45,50));

                                    

                            }

                           

                            StringvalidDataFlag =data.substring(50,51);

                           

                            if(temperature !=MiSSING && validDataFlag.matches("[01459]")){

                                     context.write(newText(year), new IntWritable(temperature));

                            }

                           

                   }

                  

         }

        

        

         publicstatic class TemperatureReducer extendsReducer<Text,IntWritable,Text,IntWritable>{

 

                   @Override

                   protectedvoid reduce(Text year, Iterable<IntWritable> data,

                                     Reducer<Text,IntWritable, Text, IntWritable>.Context context) throws IOException,InterruptedException {

                   intcoldestTemperature =Integer.MAX_VALUE;

                  

                   for(IntWritableitem:data){

                            coldestTemperature=Math.min(coldestTemperature, item.get());

                   }

                           

             context.write(year,newIntWritable(coldestTemperature));

                           

                           

                   }                         

         }

        

 

你可能感兴趣的:(Hadoop MapReduce 初级入门算法总结)