MR并行算法编程过程中遇到问题的思考

1. Reducer 类中 reduce函数外定义的变量是在Reducer机器上属于全局变量的,因此,一台机器上reduce函数均可以对该变量的值做出贡献。如代码:(sum和count数据Reducer机器上的全局变量)‘

	public static class AvgCalReducer extends Reducer
	{
		FloatWritable avg;
		float sum=0;
		int count=0;		
		public void reduce(EntityEntityWritable key,Iterablevalues,Context context) throws IOException, InterruptedException
		{

			System.out.println("reducer starting:");
			for (FloatWritable value:values)
			{
				sum=sum+value.get();
				count++;
				System.out.println(" key = "+key+" value = "+value.get());
			}
			System.out.println("average:"+sum/count);
			System.out.println("this reducer ending...");
			avg=new FloatWritable(sum/count);
			context.write(key, avg);
		}
	}

如果想使sum和count的值仅通过reduce函数进行改变,即只计算同一个key对应value的sum和count,则需要将sum和count放入reduce函数内,如下:

	public static class AvgCalReducer extends Reducer
	{
		FloatWritable avg;
		
		public void reduce(EntityEntityWritable key,Iterablevalues,Context context) throws IOException, InterruptedException
		{
			float sum=0;
			int count=0;
			System.out.println("reducer starting:");
			for (FloatWritable value:values)
			{
				sum=sum+value.get();
				count++;
				System.out.println(" key = "+key+" value = "+value.get());
			}
			System.out.println("average:"+sum/count);
			System.out.println("this reducer ending...");
			avg=new FloatWritable(sum/count);
			context.write(key, avg);
		}
	}

2. 对于顺序组合式MapReduce作业:用两个job举例:

		Configuration conf1=new Configuration();
		Job job1=new Job(conf1,"Job1");
		job1.waitForCompletion(true);

		Configuration conf2=new Configuration();
		Job job2=new Job(conf2,"Job2");
		job2.waitForCompletion(true);

注意我们之前经常写的System.exit(job.waitForCompletion(true)?0:1)在这里不可以使用,比如第一个job处的(job1.waitForCompletion(true)改成System.exit(job.waitForCompletion(true)?0:1),则系统成功完成job1后正常退出系统,没有机会再去运行job2了。


你可能感兴趣的:(Hadoop学习,项目中遇到的小问题)