Env:
- Single Node with CentOS 6.2 x86_64, 2 processors, 4Gb memory
- CDH4.3 with Cloudera Manager 4.5
- HBase 0.94.6-cdh4.3.0
- HBase 0.94.6-cdh4.3.0
- HBase shell exercise:
[root@n8 ~]# hbase shell 13/07/21 21:11:25 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.94.6-cdh4.3.0, rUnknown, Mon May 27 20:22:05 PDT 2013 hbase(main):001:0> list TABLE TestTable mytable twits users 4 row(s) in 0.8200 seconds hbase(main):002:0> create 't1', 'f1', 'f2', 'fn' 0 row(s) in 1.1490 seconds => Hbase::Table - t1 hbase(main):003:0> describe 't1' DESCRIPTION ENABLED {NAME => 't1', FAMILIES => [{NAME => 'f1', DATA_BLOCK_ENCODI true NG => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0 ', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => ' 0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOC KSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 't rue', BLOCKCACHE => 'true'}, {NAME => 'f2', DATA_BLOCK_ENCOD ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => ' 0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLO CKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => ' true', BLOCKCACHE => 'true'}, {NAME => 'fn', DATA_BLOCK_ENCO DING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BL OCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0520 seconds hbase(main):004:0> put 't1', 'r1', 'f1', 'v1' 0 row(s) in 0.0390 seconds hbase(main):005:0> put 't1', 'r1', 'f1:c1', 'v2' 0 row(s) in 0.0050 seconds hbase(main):006:0> put 't1', 'r2', 'f2', 'v3' 0 row(s) in 0.0040 seconds hbase(main):007:0> put 't1', 'r2', 'f2:c2', 'v4' 0 row(s) in 0.0050 seconds hbase(main):008:0> get 't1', 'r1' COLUMN CELL f1: timestamp=1374412382919, value=v1 f1:c1 timestamp=1374412396462, value=v2 2 row(s) in 0.0260 seconds hbase(main):009:0> get 't1', 'r1', {column=> 'f1:c1'} NameError: undefined local variable or method `column' for #<Object:0x5b3ac14d> hbase(main):010:0> get 't1', 'r1', {COLUMN => 'f1:c1'} COLUMN CELL f1:c1 timestamp=1374412396462, value=v2 1 row(s) in 0.0120 seconds hbase(main):011:0> deleteall 't1', 'r1' 0 row(s) in 0.1040 seconds hbase(main):012:0> scan 't1' ROW COLUMN+CELL r2 column=f2:, timestamp=1374412422750, value=v3 r2 column=f2:c2, timestamp=1374412437015, value=v4 1 row(s) in 0.0470 seconds hbase(main):013:0> disable 't1' 0 row(s) in 2.0510 seconds hbase(main):014:0> alter 't1', {NAME => 'f3'} Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.1410 seconds hbase(main):015:0> enable 't1' 0 row(s) in 2.0450 seconds hbase(main):016:0> describe 't1' DESCRIPTION ENABLED {NAME => 't1', FAMILIES => [{NAME => 'f1', DATA_BLOCK_ENCODI true NG => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0 ', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => ' 0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOC KSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 't rue', BLOCKCACHE => 'true'}, {NAME => 'f2', DATA_BLOCK_ENCOD ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => ' 0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLO CKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => ' true', BLOCKCACHE => 'true'}, {NAME => 'f3', DATA_BLOCK_ENCO DING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483 647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BL OCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => ' false', BLOCKCACHE => 'true'}, {NAME => 'fn', DATA_BLOCK_ENC ODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS = > '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', B LOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0510 seconds hbase(main):017:0> disable 't1' 0 row(s) in 2.0490 seconds hbase(main):018:0> alter 't1', {NAME => 'f1', METHOD => 'delete'} Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 1.1600 seconds hbase(main):019:0> enable 't1' 0 row(s) in 2.0380 seconds hbase(main):020:0> describe 't1' DESCRIPTION ENABLED {NAME => 't1', FAMILIES => [{NAME => 'f2', DATA_BLOCK_ENCODI true NG => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0 ', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => ' 0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOC KSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 't rue', BLOCKCACHE => 'true'}, {NAME => 'f3', DATA_BLOCK_ENCOD ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => ' 0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '21474836 47', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLO CKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'f alse', BLOCKCACHE => 'true'}, {NAME => 'fn', DATA_BLOCK_ENCO DING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BL OCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0500 seconds hbase(main):021:0> status 1 servers, 0 dead, 7.0000 average load hbase(main):022:0> truncate 't1' Truncating 't1' table (it may take a while): - Disabling table... - Dropping table... - Creating table... 0 row(s) in 4.1820 seconds hbase(main):023:0> disable 't1' 0 row(s) in 2.0420 seconds hbase(main):024:0> drop 't1' 0 row(s) in 1.0690 seconds
- Running HBaseClient.java:
package hbaseworkshop; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; /** * Following shows how to connect to HBase from java. * * @author srinath */ public class HBaseClient { public static void main(String[] args) throws Exception { Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, "test"); //put data Put put = new Put("row1".getBytes()); put.add("cf".getBytes(), "b".getBytes(), "val2".getBytes()); table.put(put); //read data Scan s = new Scan(); s.addFamily(Bytes.toBytes("cf")); ResultScanner results = table.getScanner(s); try { for (Result result : results) { KeyValue[] keyValuePairs = result.raw(); System.out.println(new String(result.getRow())); for (KeyValue keyValue : keyValuePairs) { System.out.println(new String(keyValue.getFamily()) + " " + new String(keyValue.getQualifier()) + "=" + new String(keyValue.getValue())); } } } finally { results.close(); } } }
Output:
[root@n8 examples]# java -cp cdh4-examples.jar:`hbase classpath` hbaseworkshop.HBaseClient #Verbose output removed here... row1 cf a=var1 cf b=val2
- Running HDIDataUploader.java:
Prepare the HBase table to be used by this example:
hbase(main):002:0> create 'HDI','ByCountry' 0 row(s) in 1.1190 seconds => Hbase::Table - HDI hbase(main):003:0> list 'HDI' TABLE HDI 1 row(s) in 0.0570 seconds hbase(main):004:0> create 'HDIResult', 'data' 0 row(s) in 1.0370 seconds => Hbase::Table - HDIResult hbase(main):005:0> list 'HDIResult' TABLE HDIResult 1 row(s) in 0.0370 seconds
package hbaseworkshop; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.util.Bytes; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; /** * This class read the data file from resources/chapter5/hdi-data.csv * and upload the data to HBase running in the local machine. * * @author srinath */ public class HDIDataUploader { private static final String TABLE_NAME = "HDI"; public static void main(String[] args) throws Exception { Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, TABLE_NAME); //The input file. BufferedReader reader = new BufferedReader(new InputStreamReader( HDIDataUploader.class.getResourceAsStream("/workshop/hdi-data.csv") )); try { String line; // skip first line reader.readLine(); while ((line = reader.readLine()) != null) { try { // line = line.replaceAll("\"(.*),(.*)\"", "$1 $2"); String[] tokens = CSVLineParser.tokenizeCSV(line).toArray(new String[0]); String country = tokens[1]; double lifeExpectacny = Double.parseDouble(tokens[3].replaceAll(",", "")); double meanYearsOfSchooling = Double.parseDouble(tokens[4].replaceAll(",", "")); double gnip = Double.parseDouble(tokens[6].replaceAll(",", "")); Put put = new Put(Bytes.toBytes(country)); put.add("ByCountry".getBytes(), Bytes.toBytes("lifeExpectacny"), Bytes.toBytes(lifeExpectacny)); put.add("ByCountry".getBytes(), Bytes.toBytes("meanYearsOfSchooling"), Bytes.toBytes(meanYearsOfSchooling)); put.add("ByCountry".getBytes(), Bytes.toBytes("gnip"), Bytes.toBytes(gnip)); table.put(put); } catch (Exception e) { e.printStackTrace(); System.out.println("Error processing " + line + " caused by " + e.getMessage()); } } } catch (IOException e) { try { reader.close(); } catch (IOException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } } //Following print back the results Scan s = new Scan(); s.addFamily(Bytes.toBytes("ByCountry")); ResultScanner results = table.getScanner(s); try { for (Result result : results) { KeyValue[] keyValuePairs = result.raw(); System.out.println(new String(result.getRow())); for (KeyValue keyValue : keyValuePairs) { System.out.println(new String(keyValue.getFamily()) + " " + new String(keyValue.getQualifier()) + "=" + Bytes.toDouble(keyValue.getValue())); } } } finally { results.close(); } } }
Output:
[root@n8 examples]# java -cp cdh4-examples.jar:`hbase classpath` hbaseworkshop.HDIDataUploader 13/07/21 22:36:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # Verbose HBase output omitted ..... Afghanistan ByCountry gnip=1416.0 ByCountry lifeExpectacny=48.7 ByCountry meanYearsOfSchooling=3.3 Albania ByCountry gnip=7803.0 ByCountry lifeExpectacny=76.9 ByCountry meanYearsOfSchooling=10.4 # Other rows omitted...
- Running AverageGINByCountryCalculator.java:
package hbaseworkshop; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.mapreduce.TableMapper; import org.apache.hadoop.hbase.mapreduce.TableReducer; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.mapreduce.Job; /** * Calculate the average of Gross National Income (GNI) per capita by country. * Dataset can be found from http://hdr.undp.org/en/statistics/data/. */ public class AverageGINByCountryCalculator { static class Mapper extends TableMapper<ImmutableBytesWritable, DoubleWritable> { private int numRecords = 0; @Override public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException { byte[] results = values.getValue("ByCountry".getBytes(), "gnip".getBytes()); // extract userKey from the compositeKey (userId + counter) ImmutableBytesWritable userKey = new ImmutableBytesWritable("gnip".getBytes()); try { context.write(userKey, new DoubleWritable(Bytes.toDouble(results))); } catch (InterruptedException e) { throw new IOException(e); } numRecords++; if ((numRecords % 50) == 0) { context.setStatus("mapper processed " + numRecords + " records so far"); } } } public static class Reducer extends TableReducer<ImmutableBytesWritable, DoubleWritable, ImmutableBytesWritable> { public void reduce(ImmutableBytesWritable key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException { double sum = 0; int count = 0; for (DoubleWritable val : values) { sum += val.get(); count++; } Put put = new Put(key.get()); put.add(Bytes.toBytes("data"), Bytes.toBytes("average"), Bytes.toBytes(sum / count)); System.out.println("Processed " + count + " values and avergae =" + sum / count); context.write(key, put); } } public static void main(String[] args) throws Exception { Configuration conf = HBaseConfiguration.create(); Job job = new Job(conf, "AverageGINByCountryCalculator"); job.setJarByClass(AverageGINByCountryCalculator.class); Scan scan = new Scan(); scan.addFamily("ByCountry".getBytes()); scan.setFilter(new FirstKeyOnlyFilter()); TableMapReduceUtil.initTableMapperJob("HDI", scan, Mapper.class, ImmutableBytesWritable.class, DoubleWritable.class, job); TableMapReduceUtil.initTableReducerJob("HDIResult", Reducer.class, job); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Output:
[root@n8 examples]# java -cp cdh4-examples.jar:`hbase classpath` hbaseworkshop.AverageGINByCountryCalcualtor 13/07/21 22:43:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/07/21 22:43:17 WARN conf.Configuration: dfs.df.interval is deprecated. Instead, use fs.df.interval 13/07/21 22:43:17 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 13/07/21 22:43:17 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 13/07/21 22:43:17 WARN conf.Configuration: topology.script.number.args is deprecated. Instead, use net.topology.script.number.args 13/07/21 22:43:17 WARN conf.Configuration: dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode 13/07/21 22:43:17 WARN conf.Configuration: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl 13/07/21 22:43:18 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 13/07/21 22:43:18 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 13/07/21 22:43:18 WARN conf.Configuration: dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects 13/07/21 22:43:18 WARN conf.Configuration: dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir 13/07/21 22:43:18 WARN conf.Configuration: dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir 13/07/21 22:43:18 WARN conf.Configuration: fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir 13/07/21 22:43:18 WARN conf.Configuration: dfs.block.size is deprecated. Instead, use dfs.blocksize 13/07/21 22:43:18 WARN conf.Configuration: dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision 13/07/21 22:43:18 WARN conf.Configuration: dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min 13/07/21 22:43:18 WARN conf.Configuration: dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir 13/07/21 22:43:18 WARN conf.Configuration: dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad 13/07/21 22:43:18 WARN conf.Configuration: dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec 13/07/21 22:43:18 WARN conf.Configuration: dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct 13/07/21 22:43:18 WARN conf.Configuration: dfs.http.address is deprecated. Instead, use dfs.namenode.http-address 13/07/21 22:43:18 WARN conf.Configuration: dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore 13/07/21 22:43:18 WARN conf.Configuration: dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource 13/07/21 22:43:18 WARN conf.Configuration: dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address 13/07/21 22:43:18 WARN conf.Configuration: dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address 13/07/21 22:43:18 WARN conf.Configuration: dfs.permissions is deprecated. Instead, use dfs.permissions.enabled 13/07/21 22:43:18 WARN conf.Configuration: dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension 13/07/21 22:43:18 WARN conf.Configuration: dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads 13/07/21 22:43:18 WARN conf.Configuration: dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth 13/07/21 22:43:18 WARN conf.Configuration: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address 13/07/21 22:43:18 WARN conf.Configuration: dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval 13/07/21 22:43:18 WARN conf.Configuration: fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir 13/07/21 22:43:18 WARN conf.Configuration: dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size 13/07/21 22:43:18 WARN conf.Configuration: dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup 13/07/21 22:43:18 WARN conf.Configuration: dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address 13/07/21 22:43:18 WARN conf.Configuration: fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period 13/07/21 22:43:19 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/21 22:43:20 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 13/07/21 22:43:20 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh4.3.0--1, built on 05/28/2013 02:01 GMT 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:host.name=n8.example.com 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_31 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_31/jre 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_31/jre/lib/amd64/server:/usr/java/jdk1.6.0_31/jre/lib/amd64:/usr/java/jdk1.6.0_31/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-358.14.1.el6.x86_64 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:user.name=root 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:user.home=/root 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root/examples 13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=n8.example.com:2181 sessionTimeout=60000 watcher=hconnection 13/07/21 22:43:20 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected] 13/07/21 22:43:20 INFO zookeeper.ClientCnxn: Opening socket connection to server n8.example.com/192.168.1.208:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 13/07/21 22:43:20 INFO zookeeper.ClientCnxn: Socket connection established to n8.example.com/192.168.1.208:2181, initiating session 13/07/21 22:43:20 INFO zookeeper.ClientCnxn: Session establishment complete on server n8.example.com/192.168.1.208:2181, sessionid = 0x140015565be00e6, negotiated timeout = 60000 13/07/21 22:43:20 INFO mapreduce.TableOutputFormat: Created table instance for HDIResult 13/07/21 22:43:20 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /192.168.1.208 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '208.1.168.192.in-addr.arpa' 13/07/21 22:43:20 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 13/07/21 22:43:20 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 13/07/21 22:43:21 INFO mapred.JobClient: Running job: job_201307212105_0001 13/07/21 22:43:22 INFO mapred.JobClient: map 0% reduce 0% 13/07/21 22:43:33 INFO mapred.JobClient: map 100% reduce 0% 13/07/21 22:43:37 INFO mapred.JobClient: map 100% reduce 100% 13/07/21 22:43:39 INFO mapred.JobClient: Job complete: job_201307212105_0001 13/07/21 22:43:39 INFO mapred.JobClient: Counters: 42 13/07/21 22:43:39 INFO mapred.JobClient: File System Counters 13/07/21 22:43:39 INFO mapred.JobClient: FILE: Number of bytes read=1167 13/07/21 22:43:39 INFO mapred.JobClient: FILE: Number of bytes written=404963 13/07/21 22:43:39 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/21 22:43:39 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/21 22:43:39 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/21 22:43:39 INFO mapred.JobClient: HDFS: Number of bytes read=68 13/07/21 22:43:39 INFO mapred.JobClient: HDFS: Number of bytes written=0 13/07/21 22:43:39 INFO mapred.JobClient: HDFS: Number of read operations=1 13/07/21 22:43:39 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/21 22:43:39 INFO mapred.JobClient: HDFS: Number of write operations=0 13/07/21 22:43:39 INFO mapred.JobClient: Job Counters 13/07/21 22:43:39 INFO mapred.JobClient: Launched map tasks=1 13/07/21 22:43:39 INFO mapred.JobClient: Launched reduce tasks=1 13/07/21 22:43:39 INFO mapred.JobClient: Data-local map tasks=1 13/07/21 22:43:39 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=10657 13/07/21 22:43:39 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=3520 13/07/21 22:43:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/21 22:43:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/21 22:43:39 INFO mapred.JobClient: Map-Reduce Framework 13/07/21 22:43:39 INFO mapred.JobClient: Map input records=187 13/07/21 22:43:39 INFO mapred.JobClient: Map output records=187 13/07/21 22:43:39 INFO mapred.JobClient: Map output bytes=2992 13/07/21 22:43:39 INFO mapred.JobClient: Input split bytes=68 13/07/21 22:43:39 INFO mapred.JobClient: Combine input records=0 13/07/21 22:43:39 INFO mapred.JobClient: Combine output records=0 13/07/21 22:43:39 INFO mapred.JobClient: Reduce input groups=1 13/07/21 22:43:39 INFO mapred.JobClient: Reduce shuffle bytes=1163 13/07/21 22:43:39 INFO mapred.JobClient: Reduce input records=187 13/07/21 22:43:39 INFO mapred.JobClient: Reduce output records=1 13/07/21 22:43:39 INFO mapred.JobClient: Spilled Records=374 13/07/21 22:43:39 INFO mapred.JobClient: CPU time spent (ms)=2130 13/07/21 22:43:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=293376000 13/07/21 22:43:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1340674048 13/07/21 22:43:39 INFO mapred.JobClient: Total committed heap usage (bytes)=169148416 13/07/21 22:43:39 INFO mapred.JobClient: HBase Counters 13/07/21 22:43:39 INFO mapred.JobClient: BYTES_IN_REMOTE_RESULTS=0 13/07/21 22:43:39 INFO mapred.JobClient: BYTES_IN_RESULTS=10288 13/07/21 22:43:39 INFO mapred.JobClient: MILLIS_BETWEEN_NEXTS=251 13/07/21 22:43:39 INFO mapred.JobClient: NOT_SERVING_REGION_EXCEPTION=0 13/07/21 22:43:39 INFO mapred.JobClient: NUM_SCANNER_RESTARTS=0 13/07/21 22:43:39 INFO mapred.JobClient: REGIONS_SCANNED=1 13/07/21 22:43:39 INFO mapred.JobClient: REMOTE_RPC_CALLS=0 13/07/21 22:43:39 INFO mapred.JobClient: REMOTE_RPC_RETRIES=0 13/07/21 22:43:39 INFO mapred.JobClient: RPC_CALLS=190 13/07/21 22:43:39 INFO mapred.JobClient: RPC_RETRIES=0
Scan the result in HBase:
hbase(main):023:0* scan 'HDIResult' ROW COLUMN+CELL gnip column=data:average, timestamp=1374418873267, value=@\xC8\xF7\x1Ba2\xA7\x04 1 row(s) in 0.0260 seconds
Although the result in HBase is correct, there was a line of error log within above MapReduce job output:
13/07/21 22:43:20 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /192.168.1.208 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '208.1.168.192.in-addr.arpa'
The reason is that HBase TableInputFormat will try to do reverse DNS resolvation. As below source code snippet of HBase shown:
// In TableInputFormatBase.java private String reverseDNS(InetAddress ipAddress) throws NamingException { String hostName = this.reverseDNSCacheMap.get(ipAddress); if (hostName == null) { hostName = Strings.domainNamePointerToHostName(DNS.reverseDns(ipAddress, this.nameServer)); this.reverseDNSCacheMap.put(ipAddress, hostName); } return hostName; }
I tried to workaround it by configuring job Configuration as below,
conf.set("hbase.nameserver.address", "192.168.1.208");
But the error still pop up, no final solution by now. After googling, one developer reported a workaround by explicitly specifying lo interfaces for master and regionserver, see the details via this link. I have not tried yet, apparently this is not the ultimate solution.