The following is an example of using HBase as a MapReduce source in read-only manner. Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper. There job would be defined as follows…
Configuration config = HBaseConfiguration.create(); Job job = new Job(config, "ExampleRead"); job.setJarByClass(MyReadJob.class); // class that contains mapper Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs ... TableMapReduceUtil.initTableMapperJob( tableName, // input HBase table name scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper null, // mapper output key null, // mapper output value job); job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapper boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
…and the mapper instance would extend TableMapper…
public static class MyMapper extends TableMapper<Text, Text> { public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException { // process data for the row from the Result instance. } }
The following is an example of using HBase both as a source and as a sink with MapReduce. This example will simply copy data from one table to another.
Configuration config = HBaseConfiguration.create(); Job job = new Job(config,"ExampleReadWrite"); job.setJarByClass(MyReadWriteJob.class); // class that contains mapper Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs TableMapReduceUtil.initTableMapperJob( sourceTable, // input table scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper class null, // mapper output key null, // mapper output value job); TableMapReduceUtil.initTableReducerJob( targetTable, // output table null, // reducer class job); job.setNumReduceTasks(0); boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
An explanation is required of what TableMapReduceUtil
is doing, especially with the reducer.TableOutputFormat is being used as the outputFormat class, and several parameters are being set on the config (e.g., TableOutputFormat.OUTPUT_TABLE), as well as setting the reducer output key toImmutableBytesWritable
and reducer value to Writable
. These could be set by the programmer on the job and conf, butTableMapReduceUtil
tries to make things easier.
The following is the example mapper, which will create a Put
and matching the inputResult
and emit it. Note: this is what the CopyTable utility does.
public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put> { public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { // this example is just copying the data from the source table... context.write(row, resultToPut(row,value)); } private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException { Put put = new Put(key.get()); for (KeyValue kv : result.raw()) { put.add(kv); } return put; } }
There isn’t actually a reducer step, so TableOutputFormat
takes care of sending thePut
to the target table.
This is just an example, developers could choose not to use TableOutputFormat
and connect to the target table themselves.
TODO: example for MultiTableOutputFormat
.
The following example uses HBase as a MapReduce source and sink with a summarization step. This example will count the number of distinct instances of a value in a table and write those summarized counts in another table.
Configuration config = HBaseConfiguration.create(); Job job = new Job(config,"ExampleSummary"); job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducer Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs TableMapReduceUtil.initTableMapperJob( sourceTable, // input table scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper class Text.class, // mapper output key IntWritable.class, // mapper output value job); TableMapReduceUtil.initTableReducerJob( targetTable, // output table MyTableReducer.class, // reducer class job); job.setNumReduceTasks(1); // at least one, adjust as required boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
In this example mapper a column with a String-value is chosen as the value to summarize upon. This value is used as the key to emit from the mapper, and anIntWritable
represents an instance counter.
public static class MyMapper extends TableMapper<Text, IntWritable> { private final IntWritable ONE = new IntWritable(1); private Text text = new Text(); public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { String val = new String(value.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr1"))); text.set(val); // we can only emit Writables... context.write(text, ONE); } }
In the reducer, the “ones” are counted (just like any other MR example that does this), and then emits aPut
.
public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int i = 0; for (IntWritable val : values) { i += val.get(); } Put put = new Put(Bytes.toBytes(key.toString())); put.add(Bytes.toBytes("cf"), Bytes.toBytes("count"), Bytes.toBytes(i)); context.write(null, put); } }
This very similar to the summary example above, with exception that this is using HBase as a MapReduce source but HDFS as the sink. The differences are in the job setup and in the reducer. The mapper remains the same.
Configuration config = HBaseConfiguration.create(); Job job = new Job(config,"ExampleSummaryToFile"); job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducer Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs TableMapReduceUtil.initTableMapperJob( sourceTable, // input table scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper class Text.class, // mapper output key IntWritable.class, // mapper output value job); job.setReducerClass(MyReducer.class); // reducer class job.setNumReduceTasks(1); // at least one, adjust as required FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile")); // adjust directories as required boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); }
As stated above, the previous Mapper can run unchanged with this example. As for the Reducer, it is a “generic” Reducer instead of extending TableMapper and emitting Puts.
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int i = 0; for (IntWritable val : values) { i += val.get(); } context.write(key, new IntWritable(i)); } }
It is also possible to perform summaries without a reducer – if you use HBase as the reducer.
An HBase target table would need to exist for the job summary. The HTable methodincrementColumnValue
would be used to atomically increment values. From a performance perspective, it might make sense to keep a Map of values with their values to be incremeneted for each map-task, and make one update per key at during thecleanup
method of the mapper. However, your milage may vary depending on the number of rows to be processed and unique keys.
In the end, the summary results are in HBase.
package mapred;
/**
* Copyright 2009 The Apache Software Foundation
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* “License”); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an “AS IS” BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
/**
* Export an HBase table.
* Writes content to sequence files up in HDFS. Use {@link Import} to read it
* back in again.
*/
public class Export {
private static final Log LOG = LogFactory.getLog(Export.class);
final static String NAME = “export”;
/**
* Mapper.
*/
static class Exporter extends TableMapper<ImmutableBytesWritable, Result> {
/**
* @param row The current table row key.
* @param value The columns.
* @param context The current context.
* @throws IOException When something is broken with the data.
* @see org.apache.hadoop.mapreduce.Mapper#map(KEYIN, VALUEIN,
* org.apache.hadoop.mapreduce.Mapper.Context)
*/
@Override
public void map(ImmutableBytesWritable row, Result value,
Context context)
throws IOException {
try {
context.write(row, value);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
/**
* Sets up the actual job.
*
* @param conf The current configuration.
* @param args The command line parameters.
* @return The newly created job.
* @throws IOException When setting up the job fails.
*/
public static Job createSubmittableJob(Configuration conf, String[] args)
throws IOException {
String tableName = args[0];
Path outputDir = new Path(args[1]);
Job job = new Job(conf, NAME + “_” + tableName);
job.setJobName(NAME + “_” + tableName);
job.setJarByClass(Exporter.class);
// TODO: Allow passing filter and subset of rows/columns.
Scan s = new Scan();
// Optional arguments.
int versions = args.length > 2? Integer.parseInt(args[2]): 1;
s.setMaxVersions(versions);
long startTime = args.length > 3? Long.parseLong(args[3]): 0L;
long endTime = args.length > 4? Long.parseLong(args[4]): Long.MAX_VALUE;
s.setTimeRange(startTime, endTime);
s.setCacheBlocks(false);
if (conf.get(TableInputFormat.SCAN_COLUMN_FAMILY) != null) {
s.addFamily(Bytes.toBytes(conf.get(TableInputFormat.SCAN_COLUMN_FAMILY)));
}
LOG.info(“verisons=” + versions + “, starttime=” + startTime +
“, endtime=” + endTime);
TableMapReduceUtil.initTableMapperJob(tableName, s, Exporter.class, null,
null, job);
// No reducers. Just write straight to output files.
job.setNumReduceTasks(0);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);
FileOutputFormat.setOutputPath(job, outputDir);
return job;
}
/*
* @param errorMsg Error message. Can be null.
*/
private static void usage(final String errorMsg) {
if (errorMsg != null && errorMsg.length() > 0) {
System.err.println(“ERROR: ” + errorMsg);
}
System.err.println(“Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " +
"[<starttime> [<endtime>]]]\n”);
System.err.println(“ Note: -D properties will be applied to the conf used. “);
System.err.println(“ For example: “);
System.err.println(“ -D mapred.output.compress=true”);
System.err.println(“ -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec”);
System.err.println(“ -D mapred.output.compression.type=BLOCK”);
System.err.println(“ Additionally, the following SCAN properties can be specified”);
System.err.println(“ to control/limit what is exported..”);
System.err.println(“ -D ” + TableInputFormat.SCAN_COLUMN_FAMILY + “=<familyName>”);
}
/**
* Main entry point.
*
* @param args The command line parameters.
* @throws Exception When running the job fails.
*/
public static void main(String[] args) throws Exception {
args = new String[]{“test”,”Out”};
Configuration conf = HBaseConfiguration.create();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
usage(“Wrong number of arguments: ” + otherArgs.length);
System.exit(-1);
}
Job job = createSubmittableJob(conf, otherArgs);
System.exit(job.waitForCompletion(true)? 0 : 1);
}
}
package mapred;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class Import {
final static String NAME = “import”;
static class Importer extends TableMapper<ImmutableBytesWritable, Put> {
@Override
public void map(ImmutableBytesWritable row, Result value,Context context) throws IOException {
try {
context.write(row, resultToPut(row, value));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException {
Put put = new Put(key.get());
for (KeyValue kv : result.raw()) {
put.add(kv);
}
return put;
}
}
public static Job createSubmittableJob(Configuration conf, String[] args) throws IOException {
String tableName = args[0];
Path inputDir = new Path(args[1]);
Job job = new Job(conf, NAME + “_” + tableName);
job.setJarByClass(Importer.class);
FileInputFormat.setInputPaths(job, inputDir);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(Importer.class);
TableMapReduceUtil.initTableReducerJob(tableName, null, job);
job.setNumReduceTasks(0);
return job;
}
private static void usage(final String errorMsg) {
if (errorMsg != null && errorMsg.length() > 0) {
System.err.println(“ERROR: ” + errorMsg);
}
System.err.println(“Usage: Import <tablename> <inputdir>”);
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
usage(“Wrong number of arguments: ” + otherArgs.length);
System.exit(-1);
}
Job job = createSubmittableJob(conf, otherArgs);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
package util;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.Bytes;
public class HBaseManager {
private static HBaseAdmin admin = null;
public static Configuration conf = null;
static{
try {
conf = HBaseManager.getHBConnection();
admin = new HBaseAdmin(conf);
} catch (MasterNotRunningException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ZooKeeperConnectionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
* This method would be used to connect to remote HBase System….
* @throws IOException
*/
public void getRemoteHBaseConnection() throws IOException{
ResultScanner scanner = null;
try {
Configuration config = new Configuration();
config.clear();
config.set(“hbase.master”, “master:60000″);
config.set(“hbase.rootdir”, “hdfs://localhost:50001/hbase”);
/* config.set(“hbase.master.info.bindAddress”, “0.0.0.0″);
config.set(“hbase.master.dns.interface”, “2888″);
config.set(“hbase.master.info.port”, “60010″);
config.set(“hbase.rpc.engine”, “org.apache.hadoop.hbase.ipc.WritableRpcEngine”);
config.set(“hbase.zookeeper.peerport”, “2888″); */
config.set(“hbase.zookeeper.quorum”, “master”);
config.set(“hbase.zookeeper.property.clientPort”,”2181″);
// HBaseAdmin.checkHBaseAvailable(config);
//creating a new table
HTable table = new HTable(config, “test”);
Scan s = new Scan();
scanner = table.getScanner(s);
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// print out the row we found and the columns we were looking for
System.out.println(“Found row: ” + rr);
}
System.out.println(“Table mytable obtained “);
// addData(table);
} catch (Exception e) {
System.out.println(“HBase is not running!”);
System.exit(1);
} finally{
scanner.close();
}
}
/**
* This method would be used to connect to Local HBase master ….
* @return
*/
public static Configuration getHBConnection(){
Configuration config = null;
try {
config = new Configuration();
config.clear();
config.set(“hbase.zookeeper.quorum”, “107.108.99.145″);
config.set(“hbase.zookeeper.property.clientPort”,”2181″);
config.set(“hbase.master”, “107.108.99.145:60000″);
//config.set(“hbase.hregion.max.filesize”, “1″);
} catch (Exception e) {
System.out.println(“HBase is not running!”);
System.exit(1);
}
return config;
}
public void putData(){
}
/**
*
* @param table
* @param splitKeys
* @param colfams
* @throws IOException
*/
public void createTable(String table, byte[][] splitKeys, String… colfams)
throws IOException {
HTableDescriptor desc = new HTableDescriptor(table);
for (String cf : colfams) {
HColumnDescriptor coldef = new HColumnDescriptor(cf);
desc.addFamily(coldef);
}
if (splitKeys != null) {
admin.createTable(desc, splitKeys);
} else {
admin.createTable(desc);
}
}
/**
*
* @param table
* @param startRow
* @param endRow
* @param numCols
* @param pad
* @param setTimestamp
* @param random
* @param colfams
* @throws IOException
*/
public void fillTable(String table, int startRow, int endRow, int numCols,
int pad, boolean setTimestamp, boolean random,
String[] colfams,String[] colVals)
throws IOException {
Configuration conf = HBaseManager.getHBConnection();
HTable tbl = new HTable(conf, table);
for (int row = startRow; row <= endRow; row++) {
for (int col = 1; col <= numCols; col++) {
Put put = new Put(Bytes.toBytes(padNum(row, pad)));
for (int i=0; i< colfams.length; i++) {
String cf = colfams[i];
String val = colVals[i];
String colName = padNum(col, pad);
if (setTimestamp) {
put.add(Bytes.toBytes(cf), Bytes.toBytes(colName),
col, Bytes.toBytes(val));
} else {
put.add(Bytes.toBytes(cf), Bytes.toBytes(colName),
Bytes.toBytes(val));
}
}
tbl.put(put);
}
}
tbl.close();
}
/**
*
* @param tableName
* @return
*/
public void dump(String table, String[] rows, String[] fams, String[] quals)
throws IOException {
HTable tbl = new HTable(conf, table);
List<Get> gets = new ArrayList<Get>();
for (String row : rows) {
Get get = new Get(Bytes.toBytes(row));
get.setMaxVersions();
if (fams != null) {
for (String fam : fams) {
for (String qual : quals) {
get.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual));
}
}
}
gets.add(get);
}
Result[] results = tbl.get(gets);
for (Result result : results) {
for (KeyValue kv : result.raw()) {
HashMap map = (HashMap) kv.toStringMap();
// System.out.println(kv.toStringMap().toString());
System.out.println( map.get(“family”) +
“: ” + Bytes.toString(kv.getValue()));
}
}
}
/**
*
* @param table
* @param row
* @param fam
* @param qual
* @param val
* @throws IOException
*/
public void put(String table, String row, String fam, String qual,
String val) throws IOException {
HTable tbl = new HTable(conf, table);
Put put = new Put(Bytes.toBytes(row));
put.add(Bytes.toBytes(fam), Bytes.toBytes(qual), Bytes.toBytes(val));
tbl.put(put);
tbl.close();
}
/**
*
* @return
*/
public void getContentData(String tableName,String[] colmFmly,String[] qualifiers, int nums){
ResultScanner scanner = null;
int i=1;
Configuration config = null;
try {
config = HBaseManager.getHBConnection();
//creating a new table
HTable table = new HTable(config, tableName);
Scan s = new Scan();
for(String column : colmFmly){
for(String qualifier : qualifiers){
s.addColumn(Bytes.toBytes(column),Bytes.toBytes(qualifier));
}
}
scanner = table.getScanner(s);
for (Result rr = scanner.next(); rr != null && i <= nums; rr = scanner.next()) {
i++;
for(String column : colmFmly){
for(String qualifier : qualifiers){
System.out.println(“key : “+column +” Value: “+ Bytes.toString(rr.getValue(Bytes.toBytes(column),Bytes.toBytes(qualifier))));
}
}
}
}catch (Exception e) {
// TODO: handle exception
}
}
public void getContentData(String tableName, int limit) throws IOException{
Configuration config = HBaseManager.getHBConnection();
List<Filter> filters = new ArrayList<Filter>();
HTable table = new HTable(config, tableName);
/**
* Filter to check the User-ID column and value …
*/
SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes(“User_ID”),Bytes.toBytes(“01″),CompareFilter.CompareOp.EQUAL, new SubstringComparator(“Usr-1″));
filters.add(filter);
/**
* Filter to check the domainType column and value …
*/
SingleColumnValueFilter filtera = new SingleColumnValueFilter(Bytes.toBytes(“DomainType”),Bytes.toBytes(“01″),CompareFilter.CompareOp.EQUAL, new SubstringComparator(“webapp”));
filters.add(filtera);
List sortedLst = new ArrayList();
FilterList filterList2 = new FilterList(
FilterList.Operator.MUST_PASS_ONE, filters);
Scan scan = new Scan();
scan.setFilter(filterList2);
ResultScanner scanner2 = table.getScanner(scan);
for (Result result : scanner2) {
// for (int i=0; i < limit; i++) {
//Result result = scanner2.next();
for (KeyValue kv : result.raw()) {
HashMap map = (HashMap) kv.toStringMap();
//System.out.println(map.get(“family”) +
// “: ” + Bytes.toString(kv.getValue()));
sortedLst.add(map.get(“family”)+”|”+Bytes.toString(kv.getValue()));
}
}
Collections.sort(sortedLst);
for(int i=0; i< sortedLst.size(); i++){
System.out.println(sortedLst.get(i));
}
scanner2.close();
}
/**
*
* @param num
* @param pad
* @return
*/
public static String padNum(int num, int pad) {
String res = Integer.toString(num);
if (pad > 0) {
while (res.length() < pad) {
res = “0″ + res;
}
}
return res;
}
/**
*
* @param args
*/
public static void main(String[] args) {
HBaseManager hmanager = new HBaseManager();
try {
//hmanager.createTable(“EvaluatedDB”, null, “Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”);
//for (int i=1; i<= 20; i++){
// hmanager.fillTable(“EvaluatedDB”, 3, 23, 1, 2, false, false, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”},new String[]{“C-”+i,”webapp”,”5.9″+i,”Usr-”+i});
//}
//hmanager.fillTable(“EvaluatedDB”, 3, 3, 1, 2, false, false, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”},new String[]{“C-1″,”webapp”,”5.9″,”Usr-1″});
//hmanager.dump(“EvaluatedDB”, new String[]{“”}, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”}, new String[]{“01″});
//hmanager.getContentData(“EvaluatedDB”, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”}, new String[]{“01″},2);
//hmanager.getContentData(“EvaluatedDB”, 10);
getHBConnection();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
package mapred;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import util.HBaseManager;
import java.io.IOException;
public class MapReduceFileToTable{
static class Map extends Mapper<LongWritable, Text, Text, Put> {
/**
* map driver code
*
* @param key
* @param value
* @param context
* @exception IOException
* @exception InterruptedException
*/
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String messageStr = value.toString();
Put put = new Put(Bytes.toBytes(“1″));
if (messageStr.contains(“\t”)) {
String[] logRecvArr = messageStr.split(“\t”);
if (logRecvArr.length >= 10) {
put.add(Bytes.toBytes(“User”), Bytes.toBytes(“UserId”),
Bytes.toBytes(logRecvArr[0]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“Id”),
Bytes.toBytes(logRecvArr[1]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayerId”),
Bytes.toBytes(logRecvArr[2]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayDate”),
Bytes.toBytes(logRecvArr[4]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayTime”),
Bytes.toBytes(logRecvArr[5]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayStopDate”),
Bytes.toBytes(logRecvArr[6]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayStopTime”),
Bytes.toBytes(logRecvArr[7]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“Langitude”),
Bytes.toBytes(logRecvArr[8]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“Latitude”),
Bytes.toBytes(logRecvArr[9]));
}
} else {
System.out.println(“Log is in incorrect format. “);
}
context.write(new Text(“1″), put);
}
}
/**
* Where jobs and their settings and sequence is set.
*
* @param args
* arguments with exception of Tools understandable ones.
*/
public int execute() throws Exception {
Configuration config = HBaseManager.conf;
Job job = new Job(config, “TrandferHdfsToUserLog”);
job.setJarByClass(MapReduceFileToTable.class); // class that
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(“Out”));
job.setMapperClass(Map.class);
TableMapReduceUtil.initTableReducerJob(
“UserLogTable”, // output table
null, // reducer class
job);
job.setNumReduceTasks(0); // at least one, adjust as required
System.out.println(“Hello Hadoop 2nd Job!!”+job.waitForCompletion(true));
return 0;
}
public static void main(String[] args) throws Exception {
new MapReduceFileToTable().execute();
}
}
package mapred;
import java.io.IOException;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.hbase.client.Put;
public class MultiTableMapper {
static class InnerMapper extends Mapper <LongWritable, Text, ImmutableBytesWritable, Put> {
public void map(LongWritable offset, Text value, Context context) throws IOException {
// contains the line of tab separated data we are working on (needs to be parsed out).
//byte[] lineBytes = value.getBytes();
String valuestring[]=value.toString().split(“\t”);
String rowid = /*HBaseManager.generateID();*/ “12345″;
// rowKey is the hbase rowKey generated from lineBytes
Put put = new Put(rowid.getBytes());
put.add(Bytes.toBytes(“UserInfo”), Bytes.toBytes(“StudentName”), Bytes.toBytes(valuestring[0]));
try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table1″)), put);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // write to the actions table
// rowKey2 is the hbase rowKey
Put put1 = new Put(rowid.getBytes());
put1.add(Bytes.toBytes(“MarksInfo”),Bytes.toBytes(“Marks”),Bytes.toBytes(valuestring[1]));
// Create your KeyValue object
//put.add(kv);
try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table2″)), put1);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // write to the actions table
}
}
public static void createSubmittableJob() throws IOException, ClassNotFoundException, InterruptedException {
Path inputDir = new Path(“in”);
Configuration conf = /*HBaseManager.getHBConnection();*/ new Configuration();
Job job = new Job(conf, “my_custom_job”);
job.setJarByClass(InnerMapper.class);
FileInputFormat.setInputPaths(job, inputDir);
job.setMapperClass(InnerMapper.class);
job.setInputFormatClass(TextInputFormat.class);
// this is the key to writing to multiple tables in hbase
job.setOutputFormatClass(MultiTableOutputFormat.class);
//job.setNumReduceTasks(0);
//TableMapReduceUtil.addDependencyJars(job);
//TableMapReduceUtil.addDependencyJars(job.getConfiguration());
System.out.println(job.waitForCompletion(true));
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
MultiTableMapper.createSubmittableJob();
System.out.println();
}
}
package mapred;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import util.HBaseManager;
public class ReadFromTableAndWriteToFile {
public static class MyMapper extends TableMapper<Text, IntWritable> {
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString()); // we can only emit Writables…
context.write(text,ONE);
}
}
public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception{
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don’t set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”, // input HBase table name
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper
Text.class, // mapper output key
IntWritable.class, // mapper output value
job);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(“Out/help.txt”));
//job.setOutputFormatClass(NullOutputFormat.class);
job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException(“error with job!”);
}
}
//ReadFromTableAndWriteToFile.java
}
package mapred;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
import util.HBaseManager;
public class ReadingFromTableMapper {
/*public static class My1Mapper extends TableMapper<Text, IntWritable> {
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString()); // we can only emit Writables…
context.write(text, ONE);
}
}*/
public static class MyMapper extends TableMapper<LongWritable,Text> {
private final LongWritable ONE = new LongWritable(1);
private Text text = new Text();
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString()); // we can only emit Writables…
context.write(ONE,text);
}
}
public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
protected void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception{
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don’t set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”, // input HBase table name
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper
null, // mapper output key
null, // mapper output value
job);
job.setOutputFormatClass(NullOutputFormat.class);
//FileOutputFormat.setOutputPath(job, new Path(“/tmp/mr/mySummaryFile”));
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException(“error with job!”);
}
}
//ReadFromTableAndWriteToFile.java
}
package mapred;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import util.HBaseManager;
public class ReadWriteWith2Map2Reducer {
public static class My1Mapper extends TableMapper<Text, IntWritable> {
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString()); // we can only emit Writables…
context.write(text, ONE);
while(true){
System.out.println(“inside while ….”);
}
}
}
public static class My2Mapper extends TableMapper<Text, IntWritable> {
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString()); // we can only emit Writables…
context.write(text, ONE);
}
}
/**
*
* @author hadoop-node1
*
*/
public static class My1TableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
System.out.println(val);
}
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes(“Recommenders”), Bytes.toBytes(“Recommenders-1″), Bytes.toBytes(key.toString()));
context.write(null, put);
}
}
public static class My2TableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
System.out.println(val);
}
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes(“Recommenders”), Bytes.toBytes(“Recommenders-1″), Bytes.toBytes(key.toString()));
context.write(null, put);
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don’t set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”, // input table
scan, // Scan instance to control CF and attribute selection
My1Mapper.class, // mapper class
Text.class, // mapper output key
IntWritable.class, // mapper output value
job);
Job job1 = new Job(config,”UserDataTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job1.setJarByClass(ReadWriteWithMapReducer.class); // class that contains mapper and reducer
Scan scan1 = new Scan();
scan1.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan1.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
“UserDataTable”, // input table
scan1, // Scan instance to control CF and attribute selection
My2Mapper.class, // mapper class
Text.class, // mapper output key
IntWritable.class, // mapper output value
job1);
TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”, // output table
My1TableReducer.class, // reducer class
job);
TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”, // output table
My2TableReducer.class, // reducer class
job1);
job.setNumReduceTasks(1); // at least one, adjust as required
job.submit();
job1.submit();
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.exit(job1.waitForCompletion(true) ? 0 : 1);
}
}
package mapred;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import util.HBaseManager;
public class ReadWriteWithMapReducer {
/**
*
* @author hadoop-node1
*
*/
public static class MyMapper extends TableMapper<Text, IntWritable> {
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
//System.out.println(row);
HashMap<String, String> musicMap = new HashMap<String, String>();
for (KeyValue kv : value.raw()) {
String qualifier = “”;
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
String family = (String) map.get(“family”);
if(family.equalsIgnoreCase(“Music”)){
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
qualifier = (String) map.get(“qualifier”);
}
String qualifierVal = Bytes.toString(kv.getValue());
musicMap.put(qualifier, qualifierVal);
}
}
System.out.println(musicMap.toString());
/*
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}*/
text.set(row.toString()); // we can only emit Writables…
context.write(text, ONE);
}
}
/**
*
* @author hadoop-node1
*
*/
public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
System.out.println(val);
}
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes(“Recommenders”), Bytes.toBytes(“Recommenders-1″), Bytes.toBytes(key.toString()));
context.write(null, put);
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don’t set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
Text.class, // mapper output key
IntWritable.class, // mapper output value
job);
/*Job job1 = new Job(config,”UserDataTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job1.setJarByClass(ReadWriteWithMapReducer.class); // class that contains mapper and reducer
Scan scan1 = new Scan();
scan1.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan1.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
“UserDataTable”, // input table
scan1, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
Text.class, // mapper output key
IntWritable.class, // mapper output value
job1);*/
TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”, // output table
MyTableReducer.class, // reducer class
job);
/*TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”, // output table
MyTableReducer.class, // reducer class
job1);
job.setNumReduceTasks(1); // at least one, adjust as required
job.submit();
job1.submit();*/
System.exit(job.waitForCompletion(true) ? 0 : 1);
//System.exit(job1.waitForCompletion(true) ? 0 : 1);
}
}
package mapred;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import util.HBaseManager;
public class TableCopyAndPaste_Mapper_Reducer {
/**
*
* @author hadoop-node1
*
*/
public static class MyMapper extends TableMapper<ImmutableBytesWritable, Writable> {
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
Put p = new Put( row.get());
HashMap<String, String> musicMap = new HashMap<String, String>();
for (KeyValue kv : value.raw()) {
String qualifier = “”;
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
String family = (String) map.get(“family”);
if(family.equalsIgnoreCase(“Music”)){
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
qualifier = (String) map.get(“qualifier”);
}
String qualifierVal = Bytes.toString(kv.getValue());
musicMap.put(qualifier, qualifierVal);
}
p.add(kv);
}
System.out.println(musicMap.toString());
context.write(row, p);
}
}
/**
*
* @author hadoop-node1
*
*/
public static class MyTableReducer extends TableReducer<Writable, Writable, Writable> {
public void reduce(Writable key, Iterable<Writable> values, Context context)
throws IOException, InterruptedException {
for (Writable putOrDelete : values) {
context.write(key, putOrDelete);
}
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
job.setJarByClass(ReadWriteWithMapReducer.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don’t set to true for MR jobs
TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”, scan,
MyMapper.class, ImmutableBytesWritable.class, Put.class, job);
TableMapReduceUtil.initTableReducerJob(“UserProfileTableCopy”, MyTableReducer.class, job);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Sometimes it is more appropriate to generate summaries to an RDBMS. For these cases, it is possible to generate summaries directly to an RDBMS via a custom reducer. Thesetup
method can connect to an RDBMS (the connection information can be passed via custom parameters in the context) and the cleanup method can close the connection.
It is critical to understand that number of reducers for the job affects the summarization implementation, and you’ll have to design this into your reducer. Specifically, whether it is designed to run as a singleton (one reducer) or multiple reducers. Neither is right or wrong, it depends on your use-case. Recognize that the more reducers that are assigned to the job, the more simultaneous connections to the RDBMS will be created – this will scale, but only to a point.
public static class MyRdbmsReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private Connection c = null; public void setup(Context context) { // create DB connection... } public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // do summarization // in this example the keys are Text, but this is just an example } public void cleanup(Context context) { // close db connection } }
In the end, the summary results are written to your RDBMS table/s.
Ref: http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/
http://hbase.apache.org/book/mapreduce.example.html
http://hbase.apache.org/book/perf.reading.html
http://stackoverflow.com/questions/2431387/how-to-read-data-from-hbase
http://svn.apache.org/repos/asf/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapred