探究 Parquet 生成方式(impala,hive都可以查询)MR程序访问(三)

1.我们已经生成相关的Parquet 文件拉,现在我们是否可以用MR程序来读取呢,那是当然可以的拉

2.废话不多说,直接上代码拉,MapReduce 主函数,为了方便处理,只有Map程序,无Reduce
public class BasketParquetWriterApp extends Configured implements Tool {

	public int run(String[] args) throws Exception {
		
		
		// 载入相关配置文件
		Configuration conf = new Configuration();
      
		Job job = Job.getInstance(conf, "BasketParquetWriterApp");

		job.setJarByClass(BasketParquetWriterApp.class);
		job.setMapperClass(BasketParquetWriterMap.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		job.setInputFormatClass(ExampleInputFormat.class);
		job.setOutputFormatClass(TextOutputFormat.class);
		
		FileInputFormat.setInputPaths(job,"/basketparquet/part_20150716103149");
		FileOutputFormat.setOutputPath(job,new Path("/tmp"));

		ControlledJob conCtrl = new ControlledJob(conf);
		conCtrl.setJob(job);
		JobControl jobCtrl = new JobControl("jobControl");
		jobCtrl.addJob(conCtrl);


		// 在线程启动
		Thread t = new Thread(jobCtrl);
		t.start();

		while (true) {
			Thread.sleep(100);
			if (jobCtrl.allFinished()) {
				jobCtrl.stop();
				break;
			}
			if (jobCtrl.getFailedJobList().size() > 0) {
				jobCtrl.stop();
				break;
			}
		}
		return 0;
	}

	public static void main(String[] args) throws Exception {
		try {
			int res = ToolRunner.run(new BasketParquetWriterApp(), args);
			System.exit(res);
		} catch (Exception e) {
			e.printStackTrace();
			System.exit(255);
		}
	}

3.只是简单的将productid值获取出来

public class BasketParquetWriterMap extends
		Mapper<Text, Group, Text, IntWritable> {


	@Override
	public void map(Text key, Group value, Context context) throws IOException,
			InterruptedException {
	

		String productid = getGaroupValue(value, "productid");
		if (StringUtils.isEmpty(productid)) {
			return;
		}
		context.write(new Text(productid), new IntWritable(1));
	}

	/**
	 * 
	 * @param group
	 * @param fieldName
	 * @return
	 */
	private String getGaroupValue(Group group, String fieldName) {
		if (0 == group.getFieldRepetitionCount(fieldName))
			return null;
		String ret = null;
		PrimitiveTypeName typeName = group.getType().getType(fieldName)
				.asPrimitiveType().getPrimitiveTypeName();
		switch (typeName) {
		case BINARY:
		case FIXED_LEN_BYTE_ARRAY:
		case INT96:
			ret = group.getBinary(fieldName, 0).toStringUsingUTF8();
			break;
		case INT64:
			ret = String.valueOf(group.getLong(fieldName, 0));
			break;
		case INT32:
			ret = String.valueOf(group.getInteger(fieldName, 0));
			break;
		case BOOLEAN:
			ret = String.valueOf(group.getBoolean(fieldName, 0));
			break;
		default:
			throw new UnsupportedOperationException(group.getType()
					.asPrimitiveType().getName()
					+ " not supported for Binary");
		}
		return ret;
	}

	
}

3.运行成功生成如下文件

github 代码如下
https://github.com/wangxuehui/parqeuet-mr

你可能感兴趣的:(hive,impala,Parquet)