Homework - HBase Shell, Java Client and MapReduce Job

Env:

  • Single Node with CentOS 6.2 x86_64, 2 processors, 4Gb memory
  • CDH4.3 with Cloudera Manager 4.5
  • HBase 0.94.6-cdh4.3.0 
  • HBase 0.94.6-cdh4.3.0
    Homework - HBase Shell, Java Client and MapReduce Job_第1张图片
     
  • HBase shell exercise:
[root@n8 ~]# hbase shell
13/07/21 21:11:25 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.6-cdh4.3.0, rUnknown, Mon May 27 20:22:05 PDT 2013

hbase(main):001:0> list
TABLE                                                                                          
TestTable                                                                                      
mytable                                                                                        
twits                                                                                          
users                                                                                          
4 row(s) in 0.8200 seconds

hbase(main):002:0> create 't1', 'f1', 'f2', 'fn'
0 row(s) in 1.1490 seconds

=> Hbase::Table - t1
hbase(main):003:0> describe 't1'
DESCRIPTION                                                   ENABLED                          
 {NAME => 't1', FAMILIES => [{NAME => 'f1', DATA_BLOCK_ENCODI true                             
 NG => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0                                  
 ', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '                                  
 0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOC                                  
 KSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 't                                  
 rue', BLOCKCACHE => 'true'}, {NAME => 'f2', DATA_BLOCK_ENCOD                                  
 ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '                                  
 0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>                                   
 '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLO                                  
 CKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => '                                  
 true', BLOCKCACHE => 'true'}, {NAME => 'fn', DATA_BLOCK_ENCO                                  
 DING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>                                   
 '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>                                  
  '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BL                                  
 OCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>                                   
 'true', BLOCKCACHE => 'true'}]}                                                               
1 row(s) in 0.0520 seconds

hbase(main):004:0> put 't1', 'r1', 'f1', 'v1'
0 row(s) in 0.0390 seconds

hbase(main):005:0> put 't1', 'r1', 'f1:c1', 'v2'
0 row(s) in 0.0050 seconds

hbase(main):006:0> put 't1', 'r2', 'f2', 'v3'
0 row(s) in 0.0040 seconds

hbase(main):007:0> put 't1', 'r2', 'f2:c2', 'v4'
0 row(s) in 0.0050 seconds

hbase(main):008:0> get 't1', 'r1'
COLUMN                   CELL                                                                  
 f1:                     timestamp=1374412382919, value=v1                                     
 f1:c1                   timestamp=1374412396462, value=v2                                     
2 row(s) in 0.0260 seconds

hbase(main):009:0> get 't1', 'r1', {column=> 'f1:c1'}
NameError: undefined local variable or method `column' for #<Object:0x5b3ac14d>

hbase(main):010:0> get 't1', 'r1', {COLUMN => 'f1:c1'}
COLUMN                   CELL                                                                  
 f1:c1                   timestamp=1374412396462, value=v2                                     
1 row(s) in 0.0120 seconds

hbase(main):011:0> deleteall 't1', 'r1'
0 row(s) in 0.1040 seconds

hbase(main):012:0> scan 't1'
ROW                      COLUMN+CELL                                                           
 r2                      column=f2:, timestamp=1374412422750, value=v3                         
 r2                      column=f2:c2, timestamp=1374412437015, value=v4                       
1 row(s) in 0.0470 seconds

hbase(main):013:0> disable 't1'
0 row(s) in 2.0510 seconds

hbase(main):014:0> alter 't1', {NAME => 'f3'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1410 seconds

hbase(main):015:0> enable 't1'
0 row(s) in 2.0450 seconds

hbase(main):016:0> describe 't1'
DESCRIPTION                                                   ENABLED                          
 {NAME => 't1', FAMILIES => [{NAME => 'f1', DATA_BLOCK_ENCODI true                             
 NG => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0                                  
 ', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '                                  
 0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOC                                  
 KSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 't                                  
 rue', BLOCKCACHE => 'true'}, {NAME => 'f2', DATA_BLOCK_ENCOD                                  
 ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '                                  
 0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>                                   
 '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLO                                  
 CKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => '                                  
 true', BLOCKCACHE => 'true'}, {NAME => 'f3', DATA_BLOCK_ENCO                                  
 DING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>                                   
 '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483                                  
 647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BL                                  
 OCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => '                                  
 false', BLOCKCACHE => 'true'}, {NAME => 'fn', DATA_BLOCK_ENC                                  
 ODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>                                  
  '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =                                  
 > '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', B                                  
 LOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>                                  
  'true', BLOCKCACHE => 'true'}]}                                                              
1 row(s) in 0.0510 seconds

hbase(main):017:0> disable 't1'
0 row(s) in 2.0490 seconds

hbase(main):018:0> alter 't1', {NAME => 'f1', METHOD => 'delete'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1600 seconds

hbase(main):019:0> enable 't1'
0 row(s) in 2.0380 seconds

hbase(main):020:0> describe 't1'
DESCRIPTION                                                   ENABLED                          
 {NAME => 't1', FAMILIES => [{NAME => 'f2', DATA_BLOCK_ENCODI true                             
 NG => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0                                  
 ', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '                                  
 0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOC                                  
 KSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 't                                  
 rue', BLOCKCACHE => 'true'}, {NAME => 'f3', DATA_BLOCK_ENCOD                                  
 ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '                                  
 0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '21474836                                  
 47', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLO                                  
 CKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'f                                  
 alse', BLOCKCACHE => 'true'}, {NAME => 'fn', DATA_BLOCK_ENCO                                  
 DING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>                                   
 '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>                                  
  '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BL                                  
 OCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>                                   
 'true', BLOCKCACHE => 'true'}]}                                                               
1 row(s) in 0.0500 seconds

hbase(main):021:0> status
1 servers, 0 dead, 7.0000 average load

hbase(main):022:0> truncate 't1'
Truncating 't1' table (it may take a while):
 - Disabling table...
 - Dropping table...
 - Creating table...
0 row(s) in 4.1820 seconds

hbase(main):023:0> disable 't1'
0 row(s) in 2.0420 seconds

hbase(main):024:0> drop 't1'
0 row(s) in 1.0690 seconds

 

  • Running HBaseClient.java:
package hbaseworkshop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

/**
 * Following shows how to connect to HBase from java.
 *
 * @author srinath
 */
public class HBaseClient {

    public static void main(String[] args) throws Exception {

        Configuration config = HBaseConfiguration.create();
        HTable table = new HTable(config, "test");

        //put data
        Put put = new Put("row1".getBytes());
        put.add("cf".getBytes(), "b".getBytes(), "val2".getBytes());
        table.put(put);

        //read data
        Scan s = new Scan();
        s.addFamily(Bytes.toBytes("cf"));
        ResultScanner results = table.getScanner(s);

        try {
            for (Result result : results) {
                KeyValue[] keyValuePairs = result.raw();
                System.out.println(new String(result.getRow()));
                for (KeyValue keyValue : keyValuePairs) {
                    System.out.println(new String(keyValue.getFamily()) + " "
                            + new String(keyValue.getQualifier())
                            + "=" + new String(keyValue.getValue()));
                }
            }
        } finally {
            results.close();
        }

    }

}

 Output:

[root@n8 examples]# java -cp cdh4-examples.jar:`hbase classpath` hbaseworkshop.HBaseClient
#Verbose output removed here...
row1
cf a=var1
cf b=val2

 

 

  • Running HDIDataUploader.java:

  Prepare the HBase table to be used by this example:

hbase(main):002:0> create 'HDI','ByCountry'
0 row(s) in 1.1190 seconds

=> Hbase::Table - HDI
hbase(main):003:0> list 'HDI'
TABLE                                                                                          
HDI                                                                                            
1 row(s) in 0.0570 seconds

hbase(main):004:0> create 'HDIResult', 'data'
0 row(s) in 1.0370 seconds

=> Hbase::Table - HDIResult
hbase(main):005:0> list 'HDIResult'
TABLE                                                                                          
HDIResult                                                                                      
1 row(s) in 0.0370 seconds

 

 

 

package hbaseworkshop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

/**
 * This class read the data file from resources/chapter5/hdi-data.csv
 * and upload the data to HBase running in the local machine.
 *
 * @author srinath
 */
public class HDIDataUploader {

    private static final String TABLE_NAME = "HDI";

    public static void main(String[] args) throws Exception {

        Configuration config = HBaseConfiguration.create();
        HTable table = new HTable(config, TABLE_NAME);

        //The input file.
        BufferedReader reader = new BufferedReader(new InputStreamReader(
                HDIDataUploader.class.getResourceAsStream("/workshop/hdi-data.csv")
        ));

        try {
            String line;
            // skip first line
            reader.readLine();
            while ((line = reader.readLine()) != null) {
                try {
                    // line = line.replaceAll("\"(.*),(.*)\"", "$1 $2");

                    String[] tokens = CSVLineParser.tokenizeCSV(line).toArray(new String[0]);
                    String country = tokens[1];
                    double lifeExpectacny = Double.parseDouble(tokens[3].replaceAll(",", ""));
                    double meanYearsOfSchooling = Double.parseDouble(tokens[4].replaceAll(",", ""));
                    double gnip = Double.parseDouble(tokens[6].replaceAll(",", ""));

                    Put put = new Put(Bytes.toBytes(country));
                    put.add("ByCountry".getBytes(), Bytes.toBytes("lifeExpectacny"), Bytes.toBytes(lifeExpectacny));
                    put.add("ByCountry".getBytes(), Bytes.toBytes("meanYearsOfSchooling"),
                            Bytes.toBytes(meanYearsOfSchooling));
                    put.add("ByCountry".getBytes(), Bytes.toBytes("gnip"), Bytes.toBytes(gnip));
                    table.put(put);
                } catch (Exception e) {
                    e.printStackTrace();
                    System.out.println("Error processing " + line + " caused by " + e.getMessage());
                }
            }
        } catch (IOException e) {
            try {
                reader.close();
            } catch (IOException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }
        }

        //Following print back the results
        Scan s = new Scan();
        s.addFamily(Bytes.toBytes("ByCountry"));
        ResultScanner results = table.getScanner(s);

        try {
            for (Result result : results) {
                KeyValue[] keyValuePairs = result.raw();
                System.out.println(new String(result.getRow()));
                for (KeyValue keyValue : keyValuePairs) {
                    System.out.println(new String(keyValue.getFamily()) + " " + new String(keyValue.getQualifier())
                            + "=" + Bytes.toDouble(keyValue.getValue()));
                }
            }
        } finally {
            results.close();
        }
    }

}

 

 Output:

[root@n8 examples]# java -cp cdh4-examples.jar:`hbase classpath` hbaseworkshop.HDIDataUploader
13/07/21 22:36:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
# Verbose HBase output omitted .....
Afghanistan
ByCountry gnip=1416.0
ByCountry lifeExpectacny=48.7
ByCountry meanYearsOfSchooling=3.3
Albania
ByCountry gnip=7803.0
ByCountry lifeExpectacny=76.9
ByCountry meanYearsOfSchooling=10.4
# Other rows omitted...

 

  • Running AverageGINByCountryCalculator.java:
package hbaseworkshop;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.mapreduce.Job;

/**
 * Calculate the average of Gross National Income (GNI) per capita by country.
 * Dataset can be found from http://hdr.undp.org/en/statistics/data/.
 */

public class AverageGINByCountryCalculator {

    static class Mapper extends TableMapper<ImmutableBytesWritable, DoubleWritable> {

        private int numRecords = 0;

        @Override
        public void map(ImmutableBytesWritable row, Result values, Context context)
                throws IOException {

            byte[] results = values.getValue("ByCountry".getBytes(), "gnip".getBytes());

            // extract userKey from the compositeKey (userId + counter)
            ImmutableBytesWritable userKey = new ImmutableBytesWritable("gnip".getBytes());
            try {
                context.write(userKey, new DoubleWritable(Bytes.toDouble(results)));
            } catch (InterruptedException e) {
                throw new IOException(e);
            }
            numRecords++;
            if ((numRecords % 50) == 0) {
                context.setStatus("mapper processed " + numRecords + " records so far");
            }
        }
    }

    public static class Reducer extends TableReducer<ImmutableBytesWritable,
            DoubleWritable, ImmutableBytesWritable> {

        public void reduce(ImmutableBytesWritable key, Iterable<DoubleWritable> values, Context context)
                throws IOException, InterruptedException {
            double sum = 0;
            int count = 0;
            for (DoubleWritable val : values) {
                sum += val.get();
                count++;
            }

            Put put = new Put(key.get());
            put.add(Bytes.toBytes("data"), Bytes.toBytes("average"), Bytes.toBytes(sum / count));
            System.out.println("Processed " + count + " values and avergae =" + sum / count);
            context.write(key, put);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = HBaseConfiguration.create();
        Job job = new Job(conf, "AverageGINByCountryCalculator");
        job.setJarByClass(AverageGINByCountryCalculator.class);
        Scan scan = new Scan();
        scan.addFamily("ByCountry".getBytes());
        scan.setFilter(new FirstKeyOnlyFilter());
        TableMapReduceUtil.initTableMapperJob("HDI", scan, Mapper.class, ImmutableBytesWritable.class,
                DoubleWritable.class, job);
        TableMapReduceUtil.initTableReducerJob("HDIResult", Reducer.class, job);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

}

 Output:

[root@n8 examples]# java -cp cdh4-examples.jar:`hbase classpath` hbaseworkshop.AverageGINByCountryCalcualtor
13/07/21 22:43:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/21 22:43:17 WARN conf.Configuration: dfs.df.interval is deprecated. Instead, use fs.df.interval
13/07/21 22:43:17 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/07/21 22:43:17 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
13/07/21 22:43:17 WARN conf.Configuration: topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
13/07/21 22:43:17 WARN conf.Configuration: dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
13/07/21 22:43:17 WARN conf.Configuration: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
13/07/21 22:43:18 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
13/07/21 22:43:18 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
13/07/21 22:43:18 WARN conf.Configuration: dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects
13/07/21 22:43:18 WARN conf.Configuration: dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir
13/07/21 22:43:18 WARN conf.Configuration: dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir
13/07/21 22:43:18 WARN conf.Configuration: fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir
13/07/21 22:43:18 WARN conf.Configuration: dfs.block.size is deprecated. Instead, use dfs.blocksize
13/07/21 22:43:18 WARN conf.Configuration: dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision
13/07/21 22:43:18 WARN conf.Configuration: dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
13/07/21 22:43:18 WARN conf.Configuration: dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir
13/07/21 22:43:18 WARN conf.Configuration: dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad
13/07/21 22:43:18 WARN conf.Configuration: dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec
13/07/21 22:43:18 WARN conf.Configuration: dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct
13/07/21 22:43:18 WARN conf.Configuration: dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
13/07/21 22:43:18 WARN conf.Configuration: dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore
13/07/21 22:43:18 WARN conf.Configuration: dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource
13/07/21 22:43:18 WARN conf.Configuration: dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address
13/07/21 22:43:18 WARN conf.Configuration: dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address
13/07/21 22:43:18 WARN conf.Configuration: dfs.permissions is deprecated. Instead, use dfs.permissions.enabled
13/07/21 22:43:18 WARN conf.Configuration: dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension
13/07/21 22:43:18 WARN conf.Configuration: dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads
13/07/21 22:43:18 WARN conf.Configuration: dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth
13/07/21 22:43:18 WARN conf.Configuration: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
13/07/21 22:43:18 WARN conf.Configuration: dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval
13/07/21 22:43:18 WARN conf.Configuration: fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir
13/07/21 22:43:18 WARN conf.Configuration: dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size
13/07/21 22:43:18 WARN conf.Configuration: dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup
13/07/21 22:43:18 WARN conf.Configuration: dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address
13/07/21 22:43:18 WARN conf.Configuration: fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period
13/07/21 22:43:19 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/21 22:43:20 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
13/07/21 22:43:20 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh4.3.0--1, built on 05/28/2013 02:01 GMT
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:host.name=n8.example.com
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_31
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_31/jre
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client 
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_31/jre/lib/amd64/server:/usr/java/jdk1.6.0_31/jre/lib/amd64:/usr/java/jdk1.6.0_31/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-358.14.1.el6.x86_64
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:user.name=root
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root/examples
13/07/21 22:43:20 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=n8.example.com:2181 sessionTimeout=60000 watcher=hconnection
13/07/21 22:43:20 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected]
13/07/21 22:43:20 INFO zookeeper.ClientCnxn: Opening socket connection to server n8.example.com/192.168.1.208:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
13/07/21 22:43:20 INFO zookeeper.ClientCnxn: Socket connection established to n8.example.com/192.168.1.208:2181, initiating session
13/07/21 22:43:20 INFO zookeeper.ClientCnxn: Session establishment complete on server n8.example.com/192.168.1.208:2181, sessionid = 0x140015565be00e6, negotiated timeout = 60000
13/07/21 22:43:20 INFO mapreduce.TableOutputFormat: Created table instance for HDIResult
13/07/21 22:43:20 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /192.168.1.208 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '208.1.168.192.in-addr.arpa'
13/07/21 22:43:20 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
13/07/21 22:43:20 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
13/07/21 22:43:21 INFO mapred.JobClient: Running job: job_201307212105_0001
13/07/21 22:43:22 INFO mapred.JobClient:  map 0% reduce 0%
13/07/21 22:43:33 INFO mapred.JobClient:  map 100% reduce 0%
13/07/21 22:43:37 INFO mapred.JobClient:  map 100% reduce 100%
13/07/21 22:43:39 INFO mapred.JobClient: Job complete: job_201307212105_0001
13/07/21 22:43:39 INFO mapred.JobClient: Counters: 42
13/07/21 22:43:39 INFO mapred.JobClient:   File System Counters
13/07/21 22:43:39 INFO mapred.JobClient:     FILE: Number of bytes read=1167
13/07/21 22:43:39 INFO mapred.JobClient:     FILE: Number of bytes written=404963
13/07/21 22:43:39 INFO mapred.JobClient:     FILE: Number of read operations=0
13/07/21 22:43:39 INFO mapred.JobClient:     FILE: Number of large read operations=0
13/07/21 22:43:39 INFO mapred.JobClient:     FILE: Number of write operations=0
13/07/21 22:43:39 INFO mapred.JobClient:     HDFS: Number of bytes read=68
13/07/21 22:43:39 INFO mapred.JobClient:     HDFS: Number of bytes written=0
13/07/21 22:43:39 INFO mapred.JobClient:     HDFS: Number of read operations=1
13/07/21 22:43:39 INFO mapred.JobClient:     HDFS: Number of large read operations=0
13/07/21 22:43:39 INFO mapred.JobClient:     HDFS: Number of write operations=0
13/07/21 22:43:39 INFO mapred.JobClient:   Job Counters 
13/07/21 22:43:39 INFO mapred.JobClient:     Launched map tasks=1
13/07/21 22:43:39 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/21 22:43:39 INFO mapred.JobClient:     Data-local map tasks=1
13/07/21 22:43:39 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=10657
13/07/21 22:43:39 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=3520
13/07/21 22:43:39 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/07/21 22:43:39 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/21 22:43:39 INFO mapred.JobClient:   Map-Reduce Framework
13/07/21 22:43:39 INFO mapred.JobClient:     Map input records=187
13/07/21 22:43:39 INFO mapred.JobClient:     Map output records=187
13/07/21 22:43:39 INFO mapred.JobClient:     Map output bytes=2992
13/07/21 22:43:39 INFO mapred.JobClient:     Input split bytes=68
13/07/21 22:43:39 INFO mapred.JobClient:     Combine input records=0
13/07/21 22:43:39 INFO mapred.JobClient:     Combine output records=0
13/07/21 22:43:39 INFO mapred.JobClient:     Reduce input groups=1
13/07/21 22:43:39 INFO mapred.JobClient:     Reduce shuffle bytes=1163
13/07/21 22:43:39 INFO mapred.JobClient:     Reduce input records=187
13/07/21 22:43:39 INFO mapred.JobClient:     Reduce output records=1
13/07/21 22:43:39 INFO mapred.JobClient:     Spilled Records=374
13/07/21 22:43:39 INFO mapred.JobClient:     CPU time spent (ms)=2130
13/07/21 22:43:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=293376000
13/07/21 22:43:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1340674048
13/07/21 22:43:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=169148416
13/07/21 22:43:39 INFO mapred.JobClient:   HBase Counters
13/07/21 22:43:39 INFO mapred.JobClient:     BYTES_IN_REMOTE_RESULTS=0
13/07/21 22:43:39 INFO mapred.JobClient:     BYTES_IN_RESULTS=10288
13/07/21 22:43:39 INFO mapred.JobClient:     MILLIS_BETWEEN_NEXTS=251
13/07/21 22:43:39 INFO mapred.JobClient:     NOT_SERVING_REGION_EXCEPTION=0
13/07/21 22:43:39 INFO mapred.JobClient:     NUM_SCANNER_RESTARTS=0
13/07/21 22:43:39 INFO mapred.JobClient:     REGIONS_SCANNED=1
13/07/21 22:43:39 INFO mapred.JobClient:     REMOTE_RPC_CALLS=0
13/07/21 22:43:39 INFO mapred.JobClient:     REMOTE_RPC_RETRIES=0
13/07/21 22:43:39 INFO mapred.JobClient:     RPC_CALLS=190
13/07/21 22:43:39 INFO mapred.JobClient:     RPC_RETRIES=0

 Scan the result in HBase:

hbase(main):023:0* scan 'HDIResult'
ROW                                              COLUMN+CELL                                                                                                                                  
 gnip                                            column=data:average, timestamp=1374418873267, value=@\xC8\xF7\x1Ba2\xA7\x04                                                                  
1 row(s) in 0.0260 seconds

 

Although the result in HBase is correct, there was a line of error log within above MapReduce job output:

13/07/21 22:43:20 ERROR mapreduce.TableInputFormatBase: Cannot resolve the host name for /192.168.1.208 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '208.1.168.192.in-addr.arpa'

The reason is that HBase TableInputFormat will try to do reverse DNS resolvation. As below source code snippet of HBase shown:

// In TableInputFormatBase.java
private String reverseDNS(InetAddress ipAddress) throws NamingException {
    String hostName = this.reverseDNSCacheMap.get(ipAddress);
    if (hostName == null) {
      hostName = Strings.domainNamePointerToHostName(DNS.reverseDns(ipAddress, this.nameServer));
      this.reverseDNSCacheMap.put(ipAddress, hostName);
    }
    return hostName;
  }

 

I tried to workaround it by configuring job Configuration as below,

conf.set("hbase.nameserver.address", "192.168.1.208");

But the error still pop up, no final solution by now. After googling, one developer reported a workaround by explicitly specifying lo interfaces for master and regionserver, see the details via this link. I have not tried yet, apparently this is not the ultimate solution.

你可能感兴趣的:(mapreduce)