[实验]hadoop例子 在线用户分析

一个简单的业务场景和例子。由wordcount例子改写。

业务场景:
每个用户有在线事件,并带有日志。分析一段时间内的在线的用户以及他们的事件数。
备注:假设事件日志中以逗号分割字段,第5个字段为用户识别码
public class ActiveUserMapper extends Mapper<Object, Text, Text, IntWritable> {

	private final static IntWritable one = new IntWritable(1);
	private Text user = new Text();

	protected void map(Object key, Text value, Context context)
			throws IOException, InterruptedException {
		StringTokenizer itr = new StringTokenizer(value.toString(), ",");
		int index = 0;
		while (itr.hasMoreTokens()) {
			if (index == 4) {
				user.set(itr.nextToken());
				context.write(user, one);
				break;
			} else {
				itr.nextToken();
			}
			index++;
		}
	}
}

public class ActiveUserReducer extends
		Reducer<Text, IntWritable, Text, IntWritable> {

	private IntWritable events = new IntWritable();

	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,
			Context context) throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable val : values) {
			sum += val.get();
		}
		events.set(sum);
		context.write(key, events);
	}
}

public class ActiveUserMRDriver extends Configured implements Tool {

	@Override
	public int run(String[] args) throws Exception {
		if(args.length != 2){
			System.out.printf("Usage %s [generic options] <in> <out>\n", getClass().getName());
			ToolRunner.printGenericCommandUsage(System.out);
			return -1;
		}
		Configuration conf = new Configuration();
		conf.set("fs.default.name", "hdfs://node04vm01:9000");
		
		Job job = new Job(conf, "active user analyst");
	    job.setJarByClass(ActiveUserMRDriver.class);
	    job.setMapperClass(ActiveUserMapper.class);
	    job.setCombinerClass(ActiveUserReducer.class);
	    job.setReducerClass(ActiveUserReducer.class);
	    
	    job.setOutputKeyClass(Text.class);
	    job.setOutputValueClass(IntWritable.class);
	    
	    FileInputFormat.setInputPaths(job, new Path(args[0]));
	    FileOutputFormat.setOutputPath(job, new Path(args[1]));

		return job.waitForCompletion(true) ? 0 : 1;
	}
	
	
	public static void main(String[] args) throws Exception {
		int exitCode = ToolRunner.run(new ActiveUserMRDriver(), args);
		System.exit(exitCode);
	}
}


job报告部分:
13/08/30 15:25:50 INFO mapred.JobClient: Job complete: job_local206120026_0001
13/08/30 15:25:50 INFO mapred.JobClient: Counters: 22
13/08/30 15:25:50 INFO mapred.JobClient:   File Output Format Counters
13/08/30 15:25:50 INFO mapred.JobClient:     Bytes Written=40450120
13/08/30 15:25:50 INFO mapred.JobClient:   FileSystemCounters
13/08/30 15:25:50 INFO mapred.JobClient:     FILE_BYTES_READ=907603353
13/08/30 15:25:50 INFO mapred.JobClient:     HDFS_BYTES_READ=4244630128
13/08/30 15:25:50 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1520436699
13/08/30 15:25:50 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=40450120
13/08/30 15:25:50 INFO mapred.JobClient:   File Input Format Counters
13/08/30 15:25:50 INFO mapred.JobClient:     Bytes Read=612273464
13/08/30 15:25:50 INFO mapred.JobClient:   Map-Reduce Framework
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce input groups=2886293
13/08/30 15:25:50 INFO mapred.JobClient:     Map output materialized bytes=103629708
13/08/30 15:25:50 INFO mapred.JobClient:     Combine output records=12122417
13/08/30 15:25:50 INFO mapred.JobClient:     Map input records=8895828
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/08/30 15:25:50 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce output records=2886293
13/08/30 15:25:50 INFO mapred.JobClient:     Spilled Records=17879555
13/08/30 15:25:50 INFO mapred.JobClient:     Map output bytes=126802892
13/08/30 15:25:50 INFO mapred.JobClient:     CPU time spent (ms)=0
13/08/30 15:25:50 INFO mapred.JobClient:     Total committed heap usage (bytes)=8510898176
13/08/30 15:25:50 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/08/30 15:25:50 INFO mapred.JobClient:     Combine input records=15261107
13/08/30 15:25:50 INFO mapred.JobClient:     Map output records=8895828
13/08/30 15:25:50 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1340
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce input records=5757138

你可能感兴趣的:(hadoop)