自定义实现mapreduce计算的value类型

1. 在进行mapreduce编程时其Hadoop内置的数据类型不能满足需求时,或针对用例优化自定义 数据类型可能执行的更好.

    因此可以通过实现org.apache.hadoop.io.Writable接口定义自定义的Writable类型,使其作为mapreduce计算的value类型。

2. 通过查看源码中org.apache.hadoop.io.Writable接口明确具体实现的实例。

public class MyWritable implements Writable {
        // Some data     
        private int counter;
        private long timestamp;
        
        public void write(DataOutput out) throws IOException {
          out.writeInt(counter);
          out.writeLong(timestamp);
        }
        
        public void readFields(DataInput in) throws IOException {
          counter = in.readInt();
          timestamp = in.readLong();
        }
        
        public static MyWritable read(DataInput in) throws IOException {
          MyWritable w = new MyWritable();
          w.readFields(in);
          return w;
        }
}
3. 自实现自定义的Writable类型是也要注意以下几点:
    3.1 如果要添加一个自定义的构造函数用于自定义的Writable类一定要保持默认的空构造函数。
    3.2 如果使用TextOutputFormat序列化自定义Writable类型的实例。要确保用于自定义的Writable数据类型有一个有意义的toString()实现。
    3.3 在读取输入数据时,Hadoop课重复使用Writable类的一个实例。在readFileds()方法里面填充字段时,不应该依赖与该对象的现 有状态。
4. 下面通过一个具体的《自定义类型处理手机上网日志》实例来感受一下自定义的Writable类型。
   4.1 数据文件名为:HTTP_20130313143750.dat(可从网上下载)。

   4.2 数据样本:1363157985066     13726230503    00-FD-07-A4-72-B8:CMCC    120.196.100.82    i02.c.aliimg.com        24    27    2481    24681    200

   4.3 数据结构类型:

        自定义实现mapreduce计算的value类型_第1张图片

    4.4 我们主要提取的是手机号、上行数据包、下行数据包、上行总流量、下行总流量。 (无论是发送请求还是返回请求都会产生数据包和流量)

5.Mapreduce程序的具体实现。

   5.1自定义数据处理类型。

public class DataWritable implements Writable {
	// upload
	private int upPackNum;
	private int upPayLoad;

	// downLoad
	private int downPackNum;
	private int downPayLoad;

	public DataWritable() {
	}

	public void set(int upPackNum, int upPayLoad, int downPackNum,
			int downPayLoad) {
		this.upPackNum = upPackNum;
		this.upPayLoad = upPayLoad;
		this.downPackNum = downPackNum;
		this.downPayLoad = downPayLoad;
	}

	public int getUpPackNum() {
		return upPackNum;
	}

	public int getUpPayLoad() {
		return upPayLoad;
	}

	public int getDownPackNum() {
		return downPackNum;
	}

	public int getDownPayLoad() {
		return downPayLoad;
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		this.upPackNum = in.readInt();
		this.upPayLoad = in.readInt();
		this.downPackNum = in.readInt();
		this.downPayLoad = in.readInt();
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeInt(upPackNum);
		out.writeInt(upPayLoad);
		out.writeInt(downPackNum);
		out.writeInt(downPayLoad);
	}

	@Override
	public String toString() {
		return upPackNum + "\t" + upPayLoad //
				+ "\t" + downPackNum + //
				"\t" + downPayLoad;
	}
}
    5.2 Mapper函数。 
static class DataTotalMapper extends
			Mapper<LongWritable, Text, Text, DataWritable> {
		private Text mapOutputKey = new Text();
		private DataWritable dataWritable = new DataWritable();

		public void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			String lineValue = value.toString();
			// split
			String[] strs = lineValue.split("\t");
			// get data
			String phoneNum = strs[1];
			int upPackNum = Integer.valueOf(strs[6]);
			int downPackNum = Integer.valueOf(strs[7]);
			int upPayLoad = Integer.valueOf(strs[8]);
			int downPayLoad = Integer.valueOf(strs[9]);
			// set map output key / value
			if (phoneNum.length() == 11)//确保处理的都是手机数据
				mapOutputKey.set(phoneNum);
			dataWritable.set(upPackNum, upPayLoad, downPackNum, downPayLoad);
			context.write(mapOutputKey, dataWritable);
		}
	}
    5.3 Reduce函数。
static class DataTotalReducer extends
			Reducer<Text, DataWritable, Text, DataWritable> {
		private DataWritable dataWritable = new DataWritable();

		public void reduce(Text key, Iterable<DataWritable> values,
				Context context) throws IOException, InterruptedException {
			int upPackNum = 0;
			int downPackNum = 0;
			int upPayLoad = 0;
			int downPayLoad = 0;
			for (DataWritable data : values) {
				upPackNum += data.getUpPackNum();
				downPackNum += data.getDownPackNum();
				upPayLoad += data.getUpPayLoad();
				downPayLoad += data.getDownPayLoad();
			}
			dataWritable.set(upPackNum, upPayLoad, downPackNum, downPayLoad);
			context.write(key, dataWritable);
		}
	}
     5.4 主函数
public class DataTotalPhone {
	static final String INPUT_PATH = "hdfs://192.168.56.171:9000/DataPhone/HTTP_20130313143750.dat";
	static final String OUT_PATH = "hdfs://192.168.56.171:9000/DataPhone/out";

	public static void main(String[] args) throws ClassNotFoundException,
			IOException, InterruptedException {
		Configuration conf = new Configuration();
		final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
		final Path outPath = new Path(OUT_PATH);
		if (fileSystem.exists(outPath)) {
			fileSystem.delete(outPath, true);
		}
		// create job
		Job job = new Job(conf, DataTotalPhone.class.getSimpleName());
		// set job
		job.setJarByClass(DataTotalMapper.class);
		// 1)input
		Path inputDir = new Path(args[0]);
		FileInputFormat.addInputPath(job, inputDir);
		// 2)map
		job.setMapperClass(DataTotalMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(DataWritable.class);
		// 3)reduce
		job.setReducerClass(DataTotalReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(DataWritable.class);
		// 4)output
		Path outputDir = new Path(args[1]);
		FileOutputFormat.setOutputPath(job, outputDir);
		boolean isSuccess = job.waitForCompletion(true);
		return isSuccess ? 0 : 1;
	}
}
6. 程序运行后结果。

   自定义实现mapreduce计算的value类型_第2张图片

 

你可能感兴趣的:(mapreduce,编程,hadoop)