首先hadoop是支持并发的Mapper的,所以hbase没有道理不实现并发的Mapper,这个类是org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.
该类简单理解就是重写了Mapper的run方法
/**
* Run the application's maps using a thread pool.
*/
@Override
public void run(Context context) throws IOException, InterruptedException {
outer = context;
int numberOfThreads = getNumberOfThreads(context);
mapClass = getMapperClass(context);
if (LOG.isDebugEnabled()) {
LOG.debug("Configuring multithread runner to use " + numberOfThreads +
" threads");
}
executor = Executors.newFixedThreadPool(numberOfThreads);
for(int i=0; i < numberOfThreads; ++i) {
MapRunner thread = new MapRunner(context);
executor.execute(thread);
}
executor.shutdown();
while (!executor.isTerminated()) {
// wait till all the threads are done
Thread.sleep(1000);
}
}
以上是源代码,引自hbase-0.94.1
同时,该类内部还实现了一个private的class MapRunner,该MapRunner持有一个mapper变量,而这个mapper就是我们要执行的mapper,而这个mapper是怎么设置进去的呢?
/**
* Set the application's mapper class.
* @param <K2> the map output key type
* @param <V2> the map output value type
* @param job the job to modify
* @param cls the class to use as the mapper
*/
public static <K2,V2>
void setMapperClass(Job job,
Class<? extends Mapper<ImmutableBytesWritable, Result,K2,V2>> cls) {
if (MultithreadedTableMapper.class.isAssignableFrom(cls)) {
throw new IllegalArgumentException("Can't have recursive " +
"MultithreadedTableMapper instances.");
}
job.getConfiguration().setClass(MAPPER_CLASS,
cls, Mapper.class);
}
以上是源代码,引自hbase-0.94.1
可以看出,我们要实现并发的Mapper类一定不能是MultithreadedTableMapper 的子类(本人在试验的时候就因为继承了MultithreadedTableMapper 而抛出异常),通过在提交任务之前调用此静态方法,就可以设定我们真实的Mapper类。
同时
/**
* Set the number of threads in the pool for running maps.
* @param job the job to modify
* @param threads the new number of threads
*/
public static void setNumberOfThreads(Job job, int threads) {
job.getConfiguration().setInt(NUMBER_OF_THREADS,
threads);
}
我们还可以调用该方法来设置并发线程的数目,默认的并发数目是10。
此外还要注意,我们使用TableMapReduceUtil来initTableMapperJob中的Mapper class必须是MultithreadedTableMapper。
最后,该类其实还实现了一些其它的内部类和方法来辅助数据的一致性,有兴趣的朋友可以自己看源代码,我这里只抛一个砖。