有kerberos认证hbase在spark环境下的使用

hadoop中计算框架MapReduce中存储到有kerberos的hdfs，由于其内部yarn进行了认证故不需要进行相关的操作，可直接进行读写操作。

spark使用有kerberos认证的hbase是一个既麻烦又简单的问题，麻烦的方面是：中文的网站相关的文章很少并且分布只是分散的知识点。官网中给的信息也不够完整，倘若要是使用还是会出现自己采坑的想象。简单的方面是：代码量很少，理解起来也不是很难。本文kerberos认证的方式使用的是keytab方式

一、有kerberos认证的hbase的写过程

由于配置文件在maven编译后，打成jar包被driver分发到各个excutor的缓存区中，各配置文件properties，由于底层使用反射技术可以进入jar中读取配置供程序使用。但是keytab作为一个单独的文件，由于没有使用properties，程序只能从文件夹中进行读取，在文件中使用传统的kerberos认证就会出现找不到文件的情况。

有两种方式可以解决这种问题：1.在sparkcontext处，使用sc.addfile(""),可以将文件加入到excutor的内存中去，同时在hbase进行IO操作的代码处使用SparkFiles.get()即可获取内存中文件。2.spark-submit提交命令中使用 --files xx.keytab 将文件加入到excutor内存中，使用获取jar包路径的反射方式进而拼接出keytab的路径，即可获取keytab文件。

最后将经过认证的user，加入到connection中，然后使用这个connection即可对hbase进行写操作了。

二、有kerberos认证的hbase的读过程

使用写过程获取经过认证的user，在

sc.newAPIHadoopRDD(hconf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable]

中的TableInputFormat中创建用于读取hbase的connection。即继承重写TableInputFormat的代码中的connection，然后将自己定义的MyTableInputFormat装入此算子即可。

三、具体认证过程涉及的代码如下：

//此类是hbase的IO操作类，其中调用了kerberosUtile的相关认证方法。

import java.util.Properties

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.TableName
import org.apache.hadoop.hbase.client.HConnectionManager
import org.apache.hadoop.hbase.client.HTableInterface
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.SparkContext
import org.apache.spark.SparkFiles
import org.apache.spark.rdd.RDD

import cn.ctyun.UIDSS.hgraph.HGraphUtil
import cn.ctyun.UIDSS.utils.Hash
import cn.ctyun.UIDSS.utils.KerberorsJavaUtil
import cn.ctyun.UIDSS.utils.Logging

object HBaseIO extends Logging {

  def getGraphTableRDD(sc: SparkContext, props: Properties): RDD[(ImmutableBytesWritable, Result)] = {
    val hconf = HBaseConfiguration.create()

    //set zookeeper quorum
    hconf.set("hbase.zookeeper.quorum", props.getProperty("hbaseZkIp"));
    //set zookeeper port
    hconf.set("hbase.zookeeper.property.clientPort", props.getProperty("hbaseZkPort"));    
    hconf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    hconf.set("hbase.zookeeper.property.maxClientCnxns", props.getProperty("hbase_zookeeper_property_maxClientCnxns"));
    hconf.set("hbase.client.retries.number", props.getProperty("hbase_client_retries_number"));    
    hconf.addResource("core-site.xml")
    hconf.addResource("hbase-site.xml")
    hconf.addResource("hdfs-site.xml")

    //set which table to scan
    //===override the TableInputFormat to MyInputFormat added kerberos authentication===
    hconf.set(MyTableInputFormat.INPUT_TABLE, props.getProperty("hbaseTableName"))

    //println(getNowDate() + " ****** Start reading from HBase   ******")
    val rdd = sc.newAPIHadoopRDD(hconf, classOf[MyTableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]).cache()
    //println(getNowDate() + " ****** Finished reading from HBase   ******")

    //遍历输出
    //    rdd.foreach {
    //      case (_, result) =>
    //        val key = Bytes.toString(result.getRow.drop(2))
    //        //println("Row key:" + key)
    //        for (c <- result.rawCells()) {
    //          val dst = Bytes.toString(c.getQualifier)
    //          var value = 0
    //          try {
    //            value = Bytes.toInt(c.getValue)
    //          } catch {
    //            case _: Throwable =>
    //          }
    //          //println("        column is: " + dst + " ;  value is: " + value)
    //        }
    //    }
    rdd
  }

  def saveToGraphTable(sc: SparkContext, props: Properties, rddToSave: RDD[((String, String), String)]): Int = {
    info("------Writing data to Graph table start--------")
    var rddToSavePartition = rddToSave
    
    val partNumHBaseO = props.getProperty("rddPartNumHBaseO").toInt
//    if (partNumHBaseO > 0) {
//      rddToSavePartition = rddToSave.repartition(partNumHBaseO)
//      val cnt= rddToSavePartition.count().toString() 
//      info(" ******  Writing " + cnt + " rows to HBase ******")
//      println(" ******  Writing " + cnt + " rows to HBase ******")
//    } 
    
    //多分区并行输出
    info("------foreachPartition write data start--------")
    rddToSavePartition.foreachPartition {
      //一个分区内的所有行     
      case (rows) =>
        //println("        column is: " + this.getClass.getClassLoader().getResource(""))
        val hconf = HBaseConfiguration.create()
        info("---------each partition create HBaseConfiguration-----------")
        //set zookeeper quorum
        hconf.set("hbase.zookeeper.quorum", props.getProperty("hbaseZkIp"))
        //set zookeeper port
        hconf.set("hbase.zookeeper.property.clientPort", props.getProperty("hbaseZkPort"))           
        hconf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
        hconf.set("hbase.zookeeper.property.maxClientCnxns", props.getProperty("hbase_zookeeper_property_maxClientCnxns"))
        hconf.set("hbase.client.retries.number", props.getProperty("hbase_client_retries_number"))
        hconf.set("hbase.client.pause", "1000")
        hconf.set("zookeeper.recovery.retry", "3")
        
        hconf.addResource("core-site.xml")
        hconf.addResource("hbase-site.xml")
        hconf.addResource("hdfs-site.xml")   
        //=========get HBase authenticated user==========
        val loginedUser = KerberorsJavaUtil.getAuthenticatedUser(hconf,props,props.getProperty("keytabFile"))
        val connection = HConnectionManager.createConnection(hconf,loginedUser)
        info("------HBase connection is created--------")
        val htable: HTableInterface = connection.getTable(TableName.valueOf(props.getProperty("hbaseTableName")))

        //批量写入
        val flushInBatch = props.getProperty("flushInBatch")
        val sWaitForHBase = props.getProperty("waitForHBase")
        val batchSize = props.getProperty("batchSize")
        
        var waitForHBase = 0
        if (flushInBatch != null && "1".compareToIgnoreCase(flushInBatch) == 0) {
          htable.setAutoFlushTo(false);
          htable.setWriteBufferSize(1024 * 1024 * batchSize.toInt);
          if (sWaitForHBase != null && sWaitForHBase.toInt > 0) {
            waitForHBase = sWaitForHBase.toInt
          }
        }

        //println(getNowDate() + " ****** Start writing to HBase   ******")

        var rowCount = 0 
        
//        for (row <- rows.toArray) (
        info("------HBase write data start--------")
        for (row <- rows) (
          {
            //row  ((行，列)，值）) 
            var src: String =  Hash.getHashString(row._1._1) + row._1._1 
            var dst: String = row._1._2
            var prop: Int = row._2.toInt
            //println("Row is: " + src + " ；column is: " + dst + " ; value is: " + prop)

            val put = new Put(Bytes.toBytes(src))
            put.add(HGraphUtil.COLUMN_FAMILY, Bytes.toBytes(dst), Bytes.toBytes(prop))
            put.setWriteToWAL(false)
            htable.put(put)
            
            rowCount = rowCount +1

            //降低写入速度
            if ((rowCount % 1000)==0 && waitForHBase >0) { Thread.sleep(waitForHBase)}
          })
        //println(getNowDate() + " ****** Finished writing to HBase   ******")  
        try{
          info("=======prepare to flushCommits======")
          htable.flushCommits()
          info("=======flushCommits finished======")
        }catch {
          case e: Exception =>
          info("=======flushCommits failed=======")
        }
        htable.close();          
        //println(getNowDate() + " ****** Flushed  to HBase   ******")
        info("------HBase write data finished--------")
    }
    1
  }
}

//此类是kerberos认证的工具类，其中对user进行相关的kerberos认证

import java.io.IOException;
import java.util.Properties;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.security.User;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.log4j.Logger;

public class KerberorsJavaUtil {
	private static final Logger LOG = Logger.getLogger(KerberorsJavaUtil.class);
	
	public static void getHBaseAuthentication(Configuration hconf,Properties props,String keytabFile){
		//get the keytab file path which added from spark submit "--files"
		String keyFilePath = KerberorsJavaUtil.class.getResource("/").getPath();
		LOG.info("=====file path====="+keyFilePath);
	    if(keyFilePath.startsWith("file")){
	    	keyFilePath = keyFilePath.substring(5);
	    }
	    //method "loginUserFromKeytab" required keyFilePath like "AAA/XXX/./keyFile"
	    keyFilePath = keyFilePath+"./"+keytabFile;
		LOG.info("------Start Get HBaseAuthentication-----");
		System.setProperty("java.security.krb5.conf",props.getProperty("krb5ConfDir"));
		hconf.set("hbase.security.authentication","kerberos");  
		hconf.set("hadoop.security.authentication","Kerberos");
		//hdfs-site.xml中namenode principal配置信息
		hconf.set("hbase.master.kerberos.principal",props.getProperty("masterPrin")); 
		//hdfs-site.xml中datanode principal配置信息
		hconf.set("hbase.regionserver.kerberos.principal",props.getProperty("regionPrin"));  
		UserGroupInformation.setConfiguration(hconf);  
	    try {
	    	//kerberos 认证 ,指定认证用户及keytab文件路径。
	    	LOG.info("------dev_yx.keytab path is---"+keyFilePath);
	    	UserGroupInformation.loginUserFromKeytab(props.getProperty("userName"),keyFilePath);
	    	LOG.info("------Get HBaseAuthentication Successed-----");
	    } catch (Exception e) {  
	        LOG.error("Get HBaseAuthentication Failed",e);  
	    }
	   
	}
	
	public static User getAuthenticatedUser(Configuration hconf,Properties props,String keytabFile){
		getHBaseAuthentication(hconf,props,keytabFile);		
		User loginedUser = null;
	    try {
	    	LOG.info("=====put the logined userinfomation to user====");
			loginedUser = User.create(UserGroupInformation.getLoginUser());
		} catch (IOException e) {
			LOG.error("===fialed put the logined userinfomation to user===",e);
		}	    
	    return loginedUser;
	}
	
}

//此类继承了TableInputFormatBase，使用kerberosUtile重写了其中的connection方法，获得经过认证的user，即可对hbase进行读操作


import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.Properties;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.RegionLocator;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableInputFormatBase;
import org.apache.hadoop.hbase.security.User;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.util.Pair;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.util.StringUtils;

import cn.ctyun.UIDSS.utils.KerberorsJavaUtil;

/**
 * Convert HBase tabular data into a format that is consumable by Map/Reduce.
 */
@InterfaceAudience.Public
@InterfaceStability.Stable
public class MyTableInputFormat extends TableInputFormatBase
implements Configurable {

  @SuppressWarnings("hiding")
  private static final Log LOG = LogFactory.getLog(MyTableInputFormat.class);

  /** Job parameter that specifies the input table. */
  public static final String INPUT_TABLE = "hbase.mapreduce.inputtable";
  /**
   * If specified, use start keys of this table to split.
   * This is useful when you are preparing data for bulkload.
   */
  private static final String SPLIT_TABLE = "hbase.mapreduce.splittable";
  /** Base-64 encoded scanner. All other SCAN_ confs are ignored if this is specified.
   * See {@link TableMapReduceUtil#convertScanToString(Scan)} for more details.
   */
  public static final String SCAN = "hbase.mapreduce.scan";
  /** Scan start row */
  public static final String SCAN_ROW_START = "hbase.mapreduce.scan.row.start";
  /** Scan stop row */
  public static final String SCAN_ROW_STOP = "hbase.mapreduce.scan.row.stop";
  /** Column Family to Scan */
  public static final String SCAN_COLUMN_FAMILY = "hbase.mapreduce.scan.column.family";
  /** Space delimited list of columns and column families to scan. */
  public static final String SCAN_COLUMNS = "hbase.mapreduce.scan.columns";
  /** The timestamp used to filter columns with a specific timestamp. */
  public static final String SCAN_TIMESTAMP = "hbase.mapreduce.scan.timestamp";
  /** The starting timestamp used to filter columns with a specific range of versions. */
  public static final String SCAN_TIMERANGE_START = "hbase.mapreduce.scan.timerange.start";
  /** The ending timestamp used to filter columns with a specific range of versions. */
  public static final String SCAN_TIMERANGE_END = "hbase.mapreduce.scan.timerange.end";
  /** The maximum number of version to return. */
  public static final String SCAN_MAXVERSIONS = "hbase.mapreduce.scan.maxversions";
  /** Set to false to disable server-side caching of blocks for this scan. */
  public static final String SCAN_CACHEBLOCKS = "hbase.mapreduce.scan.cacheblocks";
  /** The number of rows for caching that will be passed to scanners. */
  public static final String SCAN_CACHEDROWS = "hbase.mapreduce.scan.cachedrows";
  /** Set the maximum number of values to return for each call to next(). */
  public static final String SCAN_BATCHSIZE = "hbase.mapreduce.scan.batchsize";
  /** Specify if we have to shuffle the map tasks. */
  public static final String SHUFFLE_MAPS = "hbase.mapreduce.inputtable.shufflemaps";

  /** The configuration. */
  private Configuration conf = null;
  
  /** The kerberos authenticated user*/
  private User user;  
  
  /**
   * Returns the current configuration.
   *
   * @return The current configuration.
   * @see Configurable#getConf()
   */
  @Override
  public Configuration getConf() {
    return conf;
  }
  
  /**
   * Sets the configuration. This is used to set the details for the table to
   * be scanned.
   *
   * @param configuration  The configuration to set.
   * @see Configurable#setConf(
   *   Configuration)
   */
  @Override
  @edu.umd.cs.findbugs.annotations.SuppressWarnings(value="REC_CATCH_EXCEPTION",
    justification="Intentional")
  public void setConf(Configuration configuration) {
    this.conf = configuration;
    //=========get kerberos authentication before create hbase connection========== 
    Properties props = new Properties();
    try {
		props.load(this.getClass().getClassLoader().getResourceAsStream("user-id-server.properties"));
	} catch (IOException e1) {
		LOG.error("load properties file failed",e1);
	}
    user = KerberorsJavaUtil.getAuthenticatedUser(conf, props, props.getProperty("keytabFile"));
    //=============================================================================   
    
    Scan scan = null;

    if (conf.get(SCAN) != null) {
      try {
        scan = TableMapReduceUtil.convertStringToScan(conf.get(SCAN));
      } catch (IOException e) {
        LOG.error("An error occurred.", e);
      }
    } else {
      try {
        scan = new Scan();

        if (conf.get(SCAN_ROW_START) != null) {
          scan.setStartRow(Bytes.toBytes(conf.get(SCAN_ROW_START)));
        }

        if (conf.get(SCAN_ROW_STOP) != null) {
          scan.setStopRow(Bytes.toBytes(conf.get(SCAN_ROW_STOP)));
        }

        if (conf.get(SCAN_COLUMNS) != null) {
          addColumns(scan, conf.get(SCAN_COLUMNS));
        }

        if (conf.get(SCAN_COLUMN_FAMILY) != null) {
          scan.addFamily(Bytes.toBytes(conf.get(SCAN_COLUMN_FAMILY)));
        }

        if (conf.get(SCAN_TIMESTAMP) != null) {
          scan.setTimeStamp(Long.parseLong(conf.get(SCAN_TIMESTAMP)));
        }

        if (conf.get(SCAN_TIMERANGE_START) != null && conf.get(SCAN_TIMERANGE_END) != null) {
          scan.setTimeRange(
              Long.parseLong(conf.get(SCAN_TIMERANGE_START)),
              Long.parseLong(conf.get(SCAN_TIMERANGE_END)));
        }

        if (conf.get(SCAN_MAXVERSIONS) != null) {
          scan.setMaxVersions(Integer.parseInt(conf.get(SCAN_MAXVERSIONS)));
        }

        if (conf.get(SCAN_CACHEDROWS) != null) {
          scan.setCaching(Integer.parseInt(conf.get(SCAN_CACHEDROWS)));
        }

        if (conf.get(SCAN_BATCHSIZE) != null) {
          scan.setBatch(Integer.parseInt(conf.get(SCAN_BATCHSIZE)));
        }

        // false by default, full table scans generate too much BC churn
        scan.setCacheBlocks((conf.getBoolean(SCAN_CACHEBLOCKS, false)));
      } catch (Exception e) {
          LOG.error(StringUtils.stringifyException(e));
      }
    }

    setScan(scan);
  }

  @Override
  protected void initialize(JobContext context) throws IOException {
    // Do we have to worry about mis-matches between the Configuration from setConf and the one
    // in this context?
    TableName tableName = TableName.valueOf(conf.get(INPUT_TABLE));
    try {
     //====================add authenticated user ===================
      initializeTable(ConnectionFactory.createConnection(new Configuration(conf),user), tableName);
    } catch (Exception e) {
      LOG.error(StringUtils.stringifyException(e));
    }
  }

  /**
   * Parses a combined family and qualifier and adds either both or just the
   * family in case there is no qualifier. This assumes the older colon
   * divided notation, e.g. "family:qualifier".
   *
   * @param scan The Scan to update.
   * @param familyAndQualifier family and qualifier
   * @throws IllegalArgumentException When familyAndQualifier is invalid.
   */
  private static void addColumn(Scan scan, byte[] familyAndQualifier) {
    byte [][] fq = KeyValue.parseColumn(familyAndQualifier);
    if (fq.length == 1) {
      scan.addFamily(fq[0]);
    } else if (fq.length == 2) {
      scan.addColumn(fq[0], fq[1]);
    } else {
      throw new IllegalArgumentException("Invalid familyAndQualifier provided.");
    }
  }

  /**
   * Adds an array of columns specified using old format, family:qualifier.
   * 
   * Overrides previous calls to {@link Scan#addColumn(byte[], byte[])}for any families in the
   * input.
   *
   * @param scan The Scan to update.
   * @param columns array of columns, formatted as family:qualifier
   * @see Scan#addColumn(byte[], byte[])
   */
  public static void addColumns(Scan scan, byte [][] columns) {
    for (byte[] column : columns) {
      addColumn(scan, column);
    }
  }

  /**
   * Calculates the splits that will serve as input for the map tasks. The
   * number of splits matches the number of regions in a table. Splits are shuffled if
   * required.
   * @param context  The current job context.
   * @return The list of input splits.
   * @throws IOException When creating the list of splits fails.
   * @see org.apache.hadoop.mapreduce.InputFormat#getSplits(
   *   JobContext)
   */
  @Override
  public List getSplits(JobContext context) throws IOException {
    List splits = super.getSplits(context);
    if ((conf.get(SHUFFLE_MAPS) != null) && "true".equals(conf.get(SHUFFLE_MAPS).toLowerCase())) {
      Collections.shuffle(splits);
    }
    return splits;
  }

  /**
   * Convenience method to parse a string representation of an array of column specifiers.
   *
   * @param scan The Scan to update.
   * @param columns  The columns to parse.
   */
  private static void addColumns(Scan scan, String columns) {
    String[] cols = columns.split(" ");
    for (String col : cols) {
      addColumn(scan, Bytes.toBytes(col));
    }
  }

  @Override
  protected Pair getStartEndKeys() throws IOException {
    if (conf.get(SPLIT_TABLE) != null) {
      TableName splitTableName = TableName.valueOf(conf.get(SPLIT_TABLE));
      //====================add authenticated user ===================
      try (Connection conn = ConnectionFactory.createConnection(getConf(),user)) {
        try (RegionLocator rl = conn.getRegionLocator(splitTableName)) {
          return rl.getStartEndKeys();
        }
      }
    }

    return super.getStartEndKeys();
  }

  /**
   * Sets split table in map-reduce job.
   */
  public static void configureSplitTable(Job job, TableName tableName) {
    job.getConfiguration().set(SPLIT_TABLE, tableName.getNameAsString());
  }
}

/**
 *此类是完全copy的源码，存在的合理性是因为TableInputFormatBase中存在一个包范围权限的方法，此类是支持此方法的

 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package cn.ctyun.UIDSS.hbase;

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.net.URLDecoder;
import java.util.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;

import com.google.protobuf.InvalidProtocolBufferException;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.MetaTableAccessor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.hbase.classification.InterfaceStability;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HRegionPartitioner;
import org.apache.hadoop.hbase.mapreduce.JarFinder;
import org.apache.hadoop.hbase.mapreduce.KeyValueSerialization;
import org.apache.hadoop.hbase.mapreduce.MultiTableInputFormat;
import org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat;
import org.apache.hadoop.hbase.mapreduce.MutationSerialization;
import org.apache.hadoop.hbase.mapreduce.PutCombiner;
import org.apache.hadoop.hbase.mapreduce.ResultSerialization;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat;
import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
import org.apache.hadoop.hbase.protobuf.generated.ClientProtos;
import org.apache.hadoop.hbase.security.User;
import org.apache.hadoop.hbase.security.UserProvider;
import org.apache.hadoop.hbase.security.token.TokenUtil;
import org.apache.hadoop.hbase.util.Base64;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.zookeeper.ZKConfig;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.StringUtils;

/**
 * Utility for {@link TableMapper} and {@link TableReducer}
 */
@SuppressWarnings({ "rawtypes", "unchecked" })
@InterfaceAudience.Public
@InterfaceStability.Stable
public class TableMapReduceUtil {
  private static final Log LOG = LogFactory.getLog(TableMapReduceUtil.class);

  /**
   * Use this before submitting a TableMap job. It will appropriately set up
   * the job.
   *
   * @param table  The table name to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(String table, Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job)
  throws IOException {
    initTableMapperJob(table, scan, mapper, outputKeyClass, outputValueClass,
        job, true);
  }


  /**
   * Use this before submitting a TableMap job. It will appropriately set up
   * the job.
   *
   * @param table  The table name to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(TableName table,
      Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass,
      Job job) throws IOException {
    initTableMapperJob(table.getNameAsString(),
        scan,
        mapper,
        outputKeyClass,
        outputValueClass,
        job,
        true);
  }

  /**
   * Use this before submitting a TableMap job. It will appropriately set up
   * the job.
   *
   * @param table Binary representation of the table name to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @throws IOException When setting up the details fails.
   */
   public static void initTableMapperJob(byte[] table, Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job)
  throws IOException {
      initTableMapperJob(Bytes.toString(table), scan, mapper, outputKeyClass, outputValueClass,
              job, true);
  }

   /**
    * Use this before submitting a TableMap job. It will appropriately set up
    * the job.
    *
    * @param table  The table name to read from.
    * @param scan  The scan instance with the columns, time range etc.
    * @param mapper  The mapper class to use.
    * @param outputKeyClass  The class of the output key.
    * @param outputValueClass  The class of the output value.
    * @param job  The current job to adjust.  Make sure the passed job is
    * carrying all necessary HBase configuration.
    * @param addDependencyJars upload HBase jars and jars for any of the configured
    *           job classes via the distributed cache (tmpjars).
    * @throws IOException When setting up the details fails.
    */
   public static void initTableMapperJob(String table, Scan scan,
       Class mapper,
       Class outputKeyClass,
       Class outputValueClass, Job job,
       boolean addDependencyJars, Class inputFormatClass)
   throws IOException {
     initTableMapperJob(table, scan, mapper, outputKeyClass, outputValueClass, job,
         addDependencyJars, true, inputFormatClass);
   }


  /**
   * Use this before submitting a TableMap job. It will appropriately set up
   * the job.
   *
   * @param table  The table name to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *           job classes via the distributed cache (tmpjars).
   * @param initCredentials whether to initialize hbase auth credentials for the job
   * @param inputFormatClass the input format
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(String table, Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job,
      boolean addDependencyJars, boolean initCredentials,
      Class inputFormatClass)
  throws IOException {
    job.setInputFormatClass(inputFormatClass);
    if (outputValueClass != null) job.setMapOutputValueClass(outputValueClass);
    if (outputKeyClass != null) job.setMapOutputKeyClass(outputKeyClass);
    job.setMapperClass(mapper);
    if (Put.class.equals(outputValueClass)) {
      job.setCombinerClass(PutCombiner.class);
    }
    Configuration conf = job.getConfiguration();
    HBaseConfiguration.merge(conf, HBaseConfiguration.create(conf));
    conf.set(TableInputFormat.INPUT_TABLE, table);
    conf.set(TableInputFormat.SCAN, convertScanToString(scan));
    conf.setStrings("io.serializations", conf.get("io.serializations"),
        MutationSerialization.class.getName(), ResultSerialization.class.getName(),
        KeyValueSerialization.class.getName());
    if (addDependencyJars) {
      addDependencyJars(job);
    }
    if (initCredentials) {
      initCredentials(job);
    }
  }

  /**
   * Use this before submitting a TableMap job. It will appropriately set up
   * the job.
   *
   * @param table Binary representation of the table name to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *           job classes via the distributed cache (tmpjars).
   * @param inputFormatClass The class of the input format
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(byte[] table, Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job,
      boolean addDependencyJars, Class inputFormatClass)
  throws IOException {
      initTableMapperJob(Bytes.toString(table), scan, mapper, outputKeyClass,
              outputValueClass, job, addDependencyJars, inputFormatClass);
  }

  /**
   * Use this before submitting a TableMap job. It will appropriately set up
   * the job.
   *
   * @param table Binary representation of the table name to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *           job classes via the distributed cache (tmpjars).
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(byte[] table, Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job,
      boolean addDependencyJars)
  throws IOException {
      initTableMapperJob(Bytes.toString(table), scan, mapper, outputKeyClass,
              outputValueClass, job, addDependencyJars, TableInputFormat.class);
  }

  /**
   * Use this before submitting a TableMap job. It will appropriately set up
   * the job.
   *
   * @param table The table name to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *           job classes via the distributed cache (tmpjars).
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(String table, Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job,
      boolean addDependencyJars)
  throws IOException {
      initTableMapperJob(table, scan, mapper, outputKeyClass,
              outputValueClass, job, addDependencyJars, TableInputFormat.class);
  }

  /**
   * Enable a basic on-heap cache for these jobs. Any BlockCache implementation based on
   * direct memory will likely cause the map tasks to OOM when opening the region. This
   * is done here instead of in TableSnapshotRegionRecordReader in case an advanced user
   * wants to override this behavior in their job.
   */
  public static void resetCacheConfig(Configuration conf) {
    conf.setFloat(
      HConstants.HFILE_BLOCK_CACHE_SIZE_KEY, HConstants.HFILE_BLOCK_CACHE_SIZE_DEFAULT);
    conf.setFloat(HConstants.BUCKET_CACHE_SIZE_KEY, 0f);
    conf.unset(HConstants.BUCKET_CACHE_IOENGINE_KEY);
  }

  /**
   * Sets up the job for reading from one or more table snapshots, with one or more scans
   * per snapshot.
   * It bypasses hbase servers and read directly from snapshot files.
   *
   * @param snapshotScans     map of snapshot name to scans on that snapshot.
   * @param mapper            The mapper class to use.
   * @param outputKeyClass    The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job               The current job to adjust.  Make sure the passed job is
   *                          carrying all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *                          job classes via the distributed cache (tmpjars).
   */
  public static void initMultiTableSnapshotMapperJob(Map> snapshotScans,
      Class mapper, Class outputKeyClass, Class outputValueClass,
      Job job, boolean addDependencyJars, Path tmpRestoreDir) throws IOException {
    MultiTableSnapshotInputFormat.setInput(job.getConfiguration(), snapshotScans, tmpRestoreDir);

    job.setInputFormatClass(MultiTableSnapshotInputFormat.class);
    if (outputValueClass != null) {
      job.setMapOutputValueClass(outputValueClass);
    }
    if (outputKeyClass != null) {
      job.setMapOutputKeyClass(outputKeyClass);
    }
    job.setMapperClass(mapper);
    Configuration conf = job.getConfiguration();
    HBaseConfiguration.merge(conf, HBaseConfiguration.create(conf));

    if (addDependencyJars) {
      addDependencyJars(job);
    }

    resetCacheConfig(job.getConfiguration());
  }

  /**
   * Sets up the job for reading from a table snapshot. It bypasses hbase servers
   * and read directly from snapshot files.
   *
   * @param snapshotName The name of the snapshot (of a table) to read from.
   * @param scan  The scan instance with the columns, time range etc.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *           job classes via the distributed cache (tmpjars).
   *
   * @param tmpRestoreDir a temporary directory to copy the snapshot files into. Current user should
   * have write permissions to this directory, and this should not be a subdirectory of rootdir.
   * After the job is finished, restore directory can be deleted.
   * @throws IOException When setting up the details fails.
   * @see TableSnapshotInputFormat
   */
  public static void initTableSnapshotMapperJob(String snapshotName, Scan scan,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job,
      boolean addDependencyJars, Path tmpRestoreDir)
  throws IOException {
    TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir);
    initTableMapperJob(snapshotName, scan, mapper, outputKeyClass,
        outputValueClass, job, addDependencyJars, false, TableSnapshotInputFormat.class);
    resetCacheConfig(job.getConfiguration());
  }

  /**
   * Use this before submitting a Multi TableMap job. It will appropriately set
   * up the job.
   *
   * @param scans The list of {@link Scan} objects to read from.
   * @param mapper The mapper class to use.
   * @param outputKeyClass The class of the output key.
   * @param outputValueClass The class of the output value.
   * @param job The current job to adjust. Make sure the passed job is carrying
   *          all necessary HBase configuration.
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(List scans,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job) throws IOException {
    initTableMapperJob(scans, mapper, outputKeyClass, outputValueClass, job,
        true);
  }

  /**
   * Use this before submitting a Multi TableMap job. It will appropriately set
   * up the job.
   *
   * @param scans The list of {@link Scan} objects to read from.
   * @param mapper The mapper class to use.
   * @param outputKeyClass The class of the output key.
   * @param outputValueClass The class of the output value.
   * @param job The current job to adjust. Make sure the passed job is carrying
   *          all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the
   *          configured job classes via the distributed cache (tmpjars).
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(List scans,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job,
      boolean addDependencyJars) throws IOException {
    initTableMapperJob(scans, mapper, outputKeyClass, outputValueClass, job,
      addDependencyJars, true);
  }

  /**
   * Use this before submitting a Multi TableMap job. It will appropriately set
   * up the job.
   *
   * @param scans The list of {@link Scan} objects to read from.
   * @param mapper The mapper class to use.
   * @param outputKeyClass The class of the output key.
   * @param outputValueClass The class of the output value.
   * @param job The current job to adjust. Make sure the passed job is carrying
   *          all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the
   *          configured job classes via the distributed cache (tmpjars).
   * @param initCredentials whether to initialize hbase auth credentials for the job
   * @throws IOException When setting up the details fails.
   */
  public static void initTableMapperJob(List scans,
      Class mapper,
      Class outputKeyClass,
      Class outputValueClass, Job job,
      boolean addDependencyJars,
      boolean initCredentials) throws IOException {
    job.setInputFormatClass(MultiTableInputFormat.class);
    if (outputValueClass != null) {
      job.setMapOutputValueClass(outputValueClass);
    }
    if (outputKeyClass != null) {
      job.setMapOutputKeyClass(outputKeyClass);
    }
    job.setMapperClass(mapper);
    Configuration conf = job.getConfiguration();
    HBaseConfiguration.merge(conf, HBaseConfiguration.create(conf));
    List scanStrings = new ArrayList();

    for (Scan scan : scans) {
      scanStrings.add(convertScanToString(scan));
    }
    job.getConfiguration().setStrings(MultiTableInputFormat.SCANS,
      scanStrings.toArray(new String[scanStrings.size()]));

    if (addDependencyJars) {
      addDependencyJars(job);
    }

    if (initCredentials) {
      initCredentials(job);
    }
  }

  public static void initCredentials(Job job) throws IOException {
    UserProvider userProvider = UserProvider.instantiate(job.getConfiguration());
    if (userProvider.isHadoopSecurityEnabled()) {
      // propagate delegation related props from launcher job to MR job
      if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
        job.getConfiguration().set("mapreduce.job.credentials.binary",
                                   System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
      }
    }

    if (userProvider.isHBaseSecurityEnabled()) {
      try {
        // init credentials for remote cluster
        String quorumAddress = job.getConfiguration().get(TableOutputFormat.QUORUM_ADDRESS);
        User user = userProvider.getCurrent();
        if (quorumAddress != null) {
          Configuration peerConf = HBaseConfiguration.createClusterConf(job.getConfiguration(),
              quorumAddress, TableOutputFormat.OUTPUT_CONF_PREFIX);
          Connection peerConn = ConnectionFactory.createConnection(peerConf);
          try {
            TokenUtil.addTokenForJob(peerConn, user, job);
          } finally {
            peerConn.close();
          }
        }

        Connection conn = ConnectionFactory.createConnection(job.getConfiguration());
        try {
          TokenUtil.addTokenForJob(conn, user, job);
        } finally {
          conn.close();
        }
      } catch (InterruptedException ie) {
        LOG.info("Interrupted obtaining user authentication token");
        Thread.currentThread().interrupt();
      }
    }
  }

  /**
   * Obtain an authentication token, for the specified cluster, on behalf of the current user
   * and add it to the credentials for the given map reduce job.
   *
   * The quorumAddress is the key to the ZK ensemble, which contains:
   * hbase.zookeeper.quorum, hbase.zookeeper.client.port and
   * zookeeper.znode.parent
   *
   * @param job The job that requires the permission.
   * @param quorumAddress string that contains the 3 required configuratins
   * @throws IOException When the authentication token cannot be obtained.
   * @deprecated Since 1.2.0, use {@link #initCredentialsForCluster(Job, Configuration)} instead.
   */
  @Deprecated
  public static void initCredentialsForCluster(Job job, String quorumAddress)
      throws IOException {
    Configuration peerConf = HBaseConfiguration.createClusterConf(job.getConfiguration(),
        quorumAddress);
    initCredentialsForCluster(job, peerConf);
  }

  /**
   * Obtain an authentication token, for the specified cluster, on behalf of the current user
   * and add it to the credentials for the given map reduce job.
   *
   * @param job The job that requires the permission.
   * @param conf The configuration to use in connecting to the peer cluster
   * @throws IOException When the authentication token cannot be obtained.
   */
  public static void initCredentialsForCluster(Job job, Configuration conf)
      throws IOException {
    UserProvider userProvider = UserProvider.instantiate(job.getConfiguration());
    if (userProvider.isHBaseSecurityEnabled()) {
      try {
        Connection peerConn = ConnectionFactory.createConnection(conf);
        try {
          TokenUtil.addTokenForJob(peerConn, userProvider.getCurrent(), job);
        } finally {
          peerConn.close();
        }
      } catch (InterruptedException e) {
        LOG.info("Interrupted obtaining user authentication token");
        Thread.interrupted();
      }
    }
  }

  /**
   * Writes the given scan into a Base64 encoded string.
   *
   * @param scan  The scan to write out.
   * @return The scan saved in a Base64 encoded string.
   * @throws IOException When writing the scan fails.
   */
  static String convertScanToString(Scan scan) throws IOException {
    ClientProtos.Scan proto = ProtobufUtil.toScan(scan);
    return Base64.encodeBytes(proto.toByteArray());
  }

  /**
   * Converts the given Base64 string back into a Scan instance.
   *
   * @param base64  The scan details.
   * @return The newly created Scan instance.
   * @throws IOException When reading the scan instance fails.
   */
  static Scan convertStringToScan(String base64) throws IOException {
    byte [] decoded = Base64.decode(base64);
    ClientProtos.Scan scan;
    try {
      scan = ClientProtos.Scan.parseFrom(decoded);
    } catch (InvalidProtocolBufferException ipbe) {
      throw new IOException(ipbe);
    }

    return ProtobufUtil.toScan(scan);
  }

  /**
   * Use this before submitting a TableReduce job. It will
   * appropriately set up the JobConf.
   *
   * @param table  The output table.
   * @param reducer  The reducer class to use.
   * @param job  The current job to adjust.
   * @throws IOException When determining the region count fails.
   */
  public static void initTableReducerJob(String table,
    Class reducer, Job job)
  throws IOException {
    initTableReducerJob(table, reducer, job, null);
  }

  /**
   * Use this before submitting a TableReduce job. It will
   * appropriately set up the JobConf.
   *
   * @param table  The output table.
   * @param reducer  The reducer class to use.
   * @param job  The current job to adjust.
   * @param partitioner  Partitioner to use. Pass null to use
   * default partitioner.
   * @throws IOException When determining the region count fails.
   */
  public static void initTableReducerJob(String table,
    Class reducer, Job job,
    Class partitioner) throws IOException {
    initTableReducerJob(table, reducer, job, partitioner, null, null, null);
  }

  /**
   * Use this before submitting a TableReduce job. It will
   * appropriately set up the JobConf.
   *
   * @param table  The output table.
   * @param reducer  The reducer class to use.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param partitioner  Partitioner to use. Pass null to use
   * default partitioner.
   * @param quorumAddress Distant cluster to write to; default is null for
   * output to the cluster that is designated in hbase-site.xml.
   * Set this String to the zookeeper ensemble of an alternate remote cluster
   * when you would have the reduce write a cluster that is other than the
   * default; e.g. copying tables between clusters, the source would be
   * designated by hbase-site.xml and this param would have the
   * ensemble address of the remote cluster.  The format to pass is particular.
   * Pass  :<
   *             hbase.zookeeper.client.port>:
   *  such as server,server2,server3:2181:/hbase.
   * @param serverClass redefined hbase.regionserver.class
   * @param serverImpl redefined hbase.regionserver.impl
   * @throws IOException When determining the region count fails.
   */
  public static void initTableReducerJob(String table,
    Class reducer, Job job,
    Class partitioner, String quorumAddress, String serverClass,
    String serverImpl) throws IOException {
    initTableReducerJob(table, reducer, job, partitioner, quorumAddress,
        serverClass, serverImpl, true);
  }

  /**
   * Use this before submitting a TableReduce job. It will
   * appropriately set up the JobConf.
   *
   * @param table  The output table.
   * @param reducer  The reducer class to use.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param partitioner  Partitioner to use. Pass null to use
   * default partitioner.
   * @param quorumAddress Distant cluster to write to; default is null for
   * output to the cluster that is designated in hbase-site.xml.
   * Set this String to the zookeeper ensemble of an alternate remote cluster
   * when you would have the reduce write a cluster that is other than the
   * default; e.g. copying tables between clusters, the source would be
   * designated by hbase-site.xml and this param would have the
   * ensemble address of the remote cluster.  The format to pass is particular.
   * Pass  :<
   *             hbase.zookeeper.client.port>:
   *  such as server,server2,server3:2181:/hbase.
   * @param serverClass redefined hbase.regionserver.class
   * @param serverImpl redefined hbase.regionserver.impl
   * @param addDependencyJars upload HBase jars and jars for any of the configured
   *           job classes via the distributed cache (tmpjars).
   * @throws IOException When determining the region count fails.
   */
  public static void initTableReducerJob(String table,
    Class reducer, Job job,
    Class partitioner, String quorumAddress, String serverClass,
    String serverImpl, boolean addDependencyJars) throws IOException {

    Configuration conf = job.getConfiguration();
    HBaseConfiguration.merge(conf, HBaseConfiguration.create(conf));
    job.setOutputFormatClass(TableOutputFormat.class);
    if (reducer != null) job.setReducerClass(reducer);
    conf.set(TableOutputFormat.OUTPUT_TABLE, table);
    conf.setStrings("io.serializations", conf.get("io.serializations"),
        MutationSerialization.class.getName(), ResultSerialization.class.getName());
    // If passed a quorum/ensemble address, pass it on to TableOutputFormat.
    if (quorumAddress != null) {
      // Calling this will validate the format
      ZKConfig.validateClusterKey(quorumAddress);
      conf.set(TableOutputFormat.QUORUM_ADDRESS,quorumAddress);
    }
    if (serverClass != null && serverImpl != null) {
      conf.set(TableOutputFormat.REGION_SERVER_CLASS, serverClass);
      conf.set(TableOutputFormat.REGION_SERVER_IMPL, serverImpl);
    }
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(Writable.class);
    if (partitioner == HRegionPartitioner.class) {
      job.setPartitionerClass(HRegionPartitioner.class);
      int regions = MetaTableAccessor.getRegionCount(conf, TableName.valueOf(table));
      if (job.getNumReduceTasks() > regions) {
        job.setNumReduceTasks(regions);
      }
    } else if (partitioner != null) {
      job.setPartitionerClass(partitioner);
    }

    if (addDependencyJars) {
      addDependencyJars(job);
    }

    initCredentials(job);
  }

  /**
   * Ensures that the given number of reduce tasks for the given job
   * configuration does not exceed the number of regions for the given table.
   *
   * @param table  The table to get the region count for.
   * @param job  The current job to adjust.
   * @throws IOException When retrieving the table details fails.
   */
  public static void limitNumReduceTasks(String table, Job job)
  throws IOException {
    int regions =
      MetaTableAccessor.getRegionCount(job.getConfiguration(), TableName.valueOf(table));
    if (job.getNumReduceTasks() > regions)
      job.setNumReduceTasks(regions);
  }

  /**
   * Sets the number of reduce tasks for the given job configuration to the
   * number of regions the given table has.
   *
   * @param table  The table to get the region count for.
   * @param job  The current job to adjust.
   * @throws IOException When retrieving the table details fails.
   */
  public static void setNumReduceTasks(String table, Job job)
  throws IOException {
    job.setNumReduceTasks(MetaTableAccessor.getRegionCount(job.getConfiguration(),
       TableName.valueOf(table)));
  }

  /**
   * Sets the number of rows to return and cache with each scanner iteration.
   * Higher caching values will enable faster mapreduce jobs at the expense of
   * requiring more heap to contain the cached rows.
   *
   * @param job The current job to adjust.
   * @param batchSize The number of rows to return in batch with each scanner
   * iteration.
   */
  public static void setScannerCaching(Job job, int batchSize) {
    job.getConfiguration().setInt("hbase.client.scanner.caching", batchSize);
  }

  /**
   * Add HBase and its dependencies (only) to the job configuration.
   * 
   * This is intended as a low-level API, facilitating code reuse between this
   * class and its mapred counterpart. It also of use to external tools that
   * need to build a MapReduce job that interacts with HBase but want
   * fine-grained control over the jars shipped to the cluster.
   * 
   * @param conf The Configuration object to extend with dependencies.
   * @see org.apache.hadoop.hbase.mapred.TableMapReduceUtil
   * @see PIG-3285
   */
  public static void addHBaseDependencyJars(Configuration conf) throws IOException {

    // PrefixTreeCodec is part of the hbase-prefix-tree module. If not included in MR jobs jar
    // dependencies, MR jobs that write encoded hfiles will fail.
    // We used reflection here so to prevent a circular module dependency.
    // TODO - if we extract the MR into a module, make it depend on hbase-prefix-tree.
    Class prefixTreeCodecClass = null;
    try {
      prefixTreeCodecClass =
          Class.forName("org.apache.hadoop.hbase.codec.prefixtree.PrefixTreeCodec");
    } catch (ClassNotFoundException e) {
      // this will show up in unit tests but should not show in real deployments
      LOG.warn("The hbase-prefix-tree module jar containing PrefixTreeCodec is not present." +
          "  Continuing without it.");
    }

    addDependencyJars(conf,
      // explicitly pull a class from each module
      HConstants.class,                      // hbase-common
      ClientProtos.class, // hbase-protocol
      Put.class,                      // hbase-client
      org.apache.hadoop.hbase.CompatibilityFactory.class,            // hbase-hadoop-compat
      TableMapper.class,           // hbase-server
      prefixTreeCodecClass, //  hbase-prefix-tree (if null will be skipped)
      // pull necessary dependencies
      org.apache.zookeeper.ZooKeeper.class,
      io.netty.channel.Channel.class,
      com.google.protobuf.Message.class,
      com.google.common.collect.Lists.class,
      org.apache.htrace.Trace.class,
      com.yammer.metrics.core.MetricsRegistry.class);
  }

  /**
   * Returns a classpath string built from the content of the "tmpjars" value in {@code conf}.
   * Also exposed to shell scripts via `bin/hbase mapredcp`.
   */
  public static String buildDependencyClasspath(Configuration conf) {
    if (conf == null) {
      throw new IllegalArgumentException("Must provide a configuration object.");
    }
    Set paths = new HashSet(conf.getStringCollection("tmpjars"));
    if (paths.size() == 0) {
      throw new IllegalArgumentException("Configuration contains no tmpjars.");
    }
    StringBuilder sb = new StringBuilder();
    for (String s : paths) {
      // entries can take the form 'file:/path/to/file.jar'.
      int idx = s.indexOf(":");
      if (idx != -1) s = s.substring(idx + 1);
      if (sb.length() > 0) sb.append(File.pathSeparator);
      sb.append(s);
    }
    return sb.toString();
  }

  /**
   * Add the HBase dependency jars as well as jars for any of the configured
   * job classes to the job configuration, so that JobClient will ship them
   * to the cluster and add them to the DistributedCache.
   */
  public static void addDependencyJars(Job job) throws IOException {
    addHBaseDependencyJars(job.getConfiguration());
    try {
      addDependencyJars(job.getConfiguration(),
          // when making changes here, consider also mapred.TableMapReduceUtil
          // pull job classes
          job.getMapOutputKeyClass(),
          job.getMapOutputValueClass(),
          job.getInputFormatClass(),
          job.getOutputKeyClass(),
          job.getOutputValueClass(),
          job.getOutputFormatClass(),
          job.getPartitionerClass(),
          job.getCombinerClass());
    } catch (ClassNotFoundException e) {
      throw new IOException(e);
    }
  }

  /**
   * Add the jars containing the given classes to the job's configuration
   * such that JobClient will ship them to the cluster and add them to
   * the DistributedCache.
   */
  public static void addDependencyJars(Configuration conf,
      Class... classes) throws IOException {

    FileSystem localFs = FileSystem.getLocal(conf);
    Set jars = new HashSet();
    // Add jars that are already in the tmpjars variable
    jars.addAll(conf.getStringCollection("tmpjars"));

    // add jars as we find them to a map of contents jar name so that we can avoid
    // creating new jars for classes that have already been packaged.
    Map packagedClasses = new HashMap();

    // Add jars containing the specified classes
    for (Class clazz : classes) {
      if (clazz == null) continue;

      Path path = findOrCreateJar(clazz, localFs, packagedClasses);
      if (path == null) {
        LOG.warn("Could not find jar for class " + clazz +
                 " in order to ship it to the cluster.");
        continue;
      }
      if (!localFs.exists(path)) {
        LOG.warn("Could not validate jar file " + path + " for class "
                 + clazz);
        continue;
      }
      jars.add(path.toString());
    }
    if (jars.isEmpty()) return;

    conf.set("tmpjars", StringUtils.arrayToString(jars.toArray(new String[jars.size()])));
  }

  /**
   * Finds the Jar for a class or creates it if it doesn't exist. If the class is in
   * a directory in the classpath, it creates a Jar on the fly with the
   * contents of the directory and returns the path to that Jar. If a Jar is
   * created, it is created in the system temporary directory. Otherwise,
   * returns an existing jar that contains a class of the same name. Maintains
   * a mapping from jar contents to the tmp jar created.
   * @param my_class the class to find.
   * @param fs the FileSystem with which to qualify the returned path.
   * @param packagedClasses a map of class name to path.
   * @return a jar file that contains the class.
   * @throws IOException
   */
  private static Path findOrCreateJar(Class my_class, FileSystem fs,
      Map packagedClasses)
  throws IOException {
    // attempt to locate an existing jar for the class.
    String jar = findContainingJar(my_class, packagedClasses);
    if (null == jar || jar.isEmpty()) {
      jar = getJar(my_class);
      updateMap(jar, packagedClasses);
    }

    if (null == jar || jar.isEmpty()) {
      return null;
    }

    LOG.debug(String.format("For class %s, using jar %s", my_class.getName(), jar));
    return new Path(jar).makeQualified(fs);
  }

  /**
   * Add entries to packagedClasses corresponding to class files
   * contained in jar.
   * @param jar The jar who's content to list.
   * @param packagedClasses map[class -> jar]
   */
  private static void updateMap(String jar, Map packagedClasses) throws IOException {
    if (null == jar || jar.isEmpty()) {
      return;
    }
    ZipFile zip = null;
    try {
      zip = new ZipFile(jar);
      for (Enumeration iter = zip.entries(); iter.hasMoreElements();) {
        ZipEntry entry = iter.nextElement();
        if (entry.getName().endsWith("class")) {
          packagedClasses.put(entry.getName(), jar);
        }
      }
    } finally {
      if (null != zip) zip.close();
    }
  }

  /**
   * Find a jar that contains a class of the same name, if any. It will return
   * a jar file, even if that is not the first thing on the class path that
   * has a class with the same name. Looks first on the classpath and then in
   * the packagedClasses map.
   * @param my_class the class to find.
   * @return a jar file that contains the class, or null.
   * @throws IOException
   */
  private static String findContainingJar(Class my_class, Map packagedClasses)
      throws IOException {
    ClassLoader loader = my_class.getClassLoader();

    String class_file = my_class.getName().replaceAll("\\.", "/") + ".class";

    if (loader != null) {
      // first search the classpath
      for (Enumeration itr = loader.getResources(class_file); itr.hasMoreElements();) {
        URL url = itr.nextElement();
        if ("jar".equals(url.getProtocol())) {
          String toReturn = url.getPath();
          if (toReturn.startsWith("file:")) {
            toReturn = toReturn.substring("file:".length());
          }
          // URLDecoder is a misnamed class, since it actually decodes
          // x-www-form-urlencoded MIME type rather than actual
          // URL encoding (which the file path has). Therefore it would
          // decode +s to ' 's which is incorrect (spaces are actually
          // either unencoded or encoded as "%20"). Replace +s first, so
          // that they are kept sacred during the decoding process.
          toReturn = toReturn.replaceAll("\\+", "%2B");
          toReturn = URLDecoder.decode(toReturn, "UTF-8");
          return toReturn.replaceAll("!.*$", "");
        }
      }
    }

    // now look in any jars we've packaged using JarFinder. Returns null when
    // no jar is found.
    return packagedClasses.get(class_file);
  }

  /**
   * Invoke 'getJar' on a custom JarFinder implementation. Useful for some job
   * configuration contexts (HBASE-8140) and also for testing on MRv2.
   * check if we have HADOOP-9426.
   * @param my_class the class to find.
   * @return a jar file that contains the class, or null.
   */
  private static String getJar(Class my_class) {
    String ret = null;
    try {
      ret = JarFinder.getJar(my_class);
    } catch (Exception e) {
      // toss all other exceptions, related to reflection failure
      throw new RuntimeException("getJar invocation failed.", e);
    }

    return ret;
  }
}

你可能感兴趣的:(spark+hbase)

Spark+Hbase 亿级流量分析实战（数据结构设计）大猪大猪
靠文章生存的大厂们/小红书/CSDN(PS:好吧你们仨记得给我广告费)，对优秀的文章进行大数据分析的工作必不可以少了，本系列文章将会从关于文章的各个维度进行实战分析，这系列文章新手可借着踏入大数据研发的大门，至于大数据的大佬们可以一起来相互伤害，至少为什么取名为''亿级流量分析实战''看完后整个系列的文章你就知道了，相信大家都是会举一反三的孩子们。网名：大猪佩琪姓名：不祥年龄：不祥身高：不祥性别：
Spark+Hbase 亿级流量分析实战（日志存储设计）大猪大猪
接着上篇文章Spark+Hbase亿级流量分析实战（数据结构设计）我们已经设计好了日志的结构，接下来我们就准备要开始撸代码了，我最喜欢这部分的环节了，可是一个上来连就撸代码的程序肯定不是好程序员，要不先设计设计流程图？那来吧！！！用户发起文章操作，发起请求日志日志将由SLB服务器进行负载到日志打点服务器。NSA将作为日志收集中心进行存储，也可以使用Rsync把节点上的日志同步到日志中心。作为核心的
Spark+Hbase 亿级流量分析实战（小巧高性能的ETL）大猪大猪
在上一篇文章大猪已经介绍了日志存储设计方案，我们数据已经落地到数据中心上了，那接下来如何ETL呢？毕竟可是生产环境级别的，可不能乱来。其实只要解决几个问题即可，不必要引入很大级别的组件来做，当然了各有各的千秋，本文主要从易懂、小巧、简洁、高性能这三个方面去设计出发点，顺便还实现了一个精巧的Filebeat。要实现的功能就是扫描每天的增量日志并写入Hbase中需要搞定下面几个不务正业的小老弟需要把文
Spark+Hbase 亿级流量分析实战（ PV/UV ）大猪大猪
作为一个亿级的流量分析统计系统怎么能没有PV/UV这两经典的超级玛丽亚指标呢，话说五百年前它俩可是鼻祖，咳咳...，不好意思没忍住，回归正文，大猪在上一篇已经介绍了小巧高性能ETL程序设计与实现了，到现在，我们的数据已经落地到Hbase上了，而且日志的时间也已经写到Mysql了，万事都已经具备了，接下来我们就要撸指标了，先从两个经典的指标开始撸。我们先理一下整个程序的计算流程，请看大图：开始计算是
Spark+Hbase 亿级流量分析实战（留存计算）大猪大猪
这篇已经是本系列文章的第五篇了，/小红书/CSDN还不快来感谢大猪，上一篇大猪已经介绍PV/UV的实现方式以及程序的计算逻辑，本篇大猪继续为小伙伴介绍留存，看在Spark+Hbase的架构中到底是怎么实现这种指标的。大猪的习惯就是能上图就尽量不~~~，好的图是会说话的，大猪也在努力实现中。详细分析过程大猪25通过某篇文章注册了帐号，26去浪去了。27再次登录，小伙伴猜猜是哪天的几日留存？这么简单的
Spark+Hbase 亿级流量分析实战（日志存储设计）叫我不矜持
接着上篇文章Spark+Hbase亿级流量分析实战（数据结构设计）我们已经设计好了日志的结构，接下来我们就准备要开始撸代码了，我最喜欢这部分的环节了，可是一个上来连就撸代码的程序肯定不是好程序员，要不先设计设计流程图？那来吧！！！用户发起文章操作，发起请求日志日志将由SLB服务器进行负载到日志打点服务器。NSA将作为日志收集中心进行存储，也可以使用Rsync把节点上的日志同步到日志中心。作为核心的
Spark+Hbase 亿级流量分析实战（数据结构设计）叫我不矜持
靠文章生存的大厂们简书/小红书/CSDN(PS:好吧你们仨记得给我广告费)，对优秀的文章进行大数据分析的工作必不可以少了，本系列文章将会从关于文章的各个维度进行实战分析，这系列文章新手可借着踏入大数据研发的大门，至于大数据的大佬们可以一起来相互伤害，至少为什么取名为''亿级流量分析实战''看完后整个系列的文章你就知道了，相信大家都是会举一反三的孩子们。网名：大猪佩琪姓名：不祥年龄：不祥身高：不祥性
Spark+HBase分布式上传海量图片数据 Fang20160214 spark HBase
Spark+HBase分布式批量上传海量本地图片集群架构3台PC机都是4G的内存，Master和一个Worker是i5处理器，一个Worker为i3处理器218.199.92.225fang-ubuntu（Master）218.199.92.226fei-ubuntu（Worker）218.199.92.227kun-ubuntu（Worker）软件环境Ubuntu1604Hadoop-2.7.2
浅析 MapReduce/ Spark/ Spark Steaming/ Storm 与 HBase/HDFS 思路清晰的小王大数据-理论
MapReduce是大的批量操作，不要求时限。基于文件系统，hdfs。Spark是快速的批量操作，基于内存，所以速度快。其主要亮点在于把过程给数据。Storm是流式处理，快速实时。SparkStreaming跟Storm类似，只不过SparkStreaming是小时间窗口的处理，Storm是实时的来一条处理一条。1.storm技术架构：Flume+kafka+Storm/Spark+Hbase/R
Spark 批量写数据入HBase mach_learn spark hbase
介绍工作中常常会遇到这种情形，需要将hdfs中的大批量数据导入HBase。本文使用Spark+HBase的方式将RDD中的数据导入HBase中。没有使用官网提供的newAPIHadoopRDD接口的方式。使用本文的方式将数据导入HBase,7000W条数据，花费时间大概20分钟左右，本文Spark可用核数量为20。本文使用spark版本为1.3.0，hbase版本为0.98.1hbase表结构
Spark批量写数据入HBase 迷途小码 spark开发
＝＝＝＝＝＝转自：http://www.it165.net/admin/html/201506/5699.html＝＝＝＝＝＝介绍工作中常常会遇到这种情形，需要将hdfs中的大批量数据导入HBase。本文使用Spark+HBase的方式将RDD中的数据导入HBase中。没有使用官网提供的newAPIHadoopRDD接口的方式。使用本文的方式将数据导入HBase,7000W条数据，花费时间大概20
Spark+hbase环境搭建梦翼-
一、环境Spark:2.1.0Hadoop:2.6.0Hbase:1.2.6开发环境：AndroidStudio二、hbase简介HBase是一个分布式的、面向列的开源数据库，该技术来源于FayChang所撰写的Google论文“Bigtable：一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统（FileSystem）所提供的分布式数据存储一样，HBase在Hado
Spark批量写数据入HBase 大数据技术进阶
＝＝＝＝＝＝转自：http://www.it165.net/admin/html/201506/5699.html＝＝＝＝＝＝介绍工作中常常会遇到这种情形，需要将hdfs中的大批量数据导入HBase。本文使用Spark+HBase的方式将RDD中的数据导入HBase中。没有使用官网提供的newAPIHadoopRDD接口的方式。使用本文的方式将数据导入HBase,7000W条数据，花费时间大概20
Spark+Hbase 读取分片数据、深挖原理大猪大猪
大猪见很多文章都写了Hbase如何设计rowkey避免热点问题，就连大猪的文章也写过这样的优化，但是只说到了优化的点上，那如何读取呢？刚才就有一位老朋友跟我说他的方案，他是做了16个预分区，然后就把16个分区的数据使用spark的union起来，组成16个RDD，牛批的孩子，看到他这么干，我得写篇文章出来探讨一下这个问题了。Rowkey设计在设计Hbase的rowkey的时候，我们往往会在高位上设
Spark+Hbase 亿级流量分析实战（ PV/UV ） dounine hbase spark
作为一个百亿级的流量实时分析统计系统怎么能没有PV/UV这两经典的超级玛丽亚指标呢，话说五百年前它俩可是鼻祖，咳咳…，不好意思没忍住，回归正文，大猪在上一篇已经介绍了小巧高性能ETL程序设计与实现了，到现在，我们的数据已经落地到Hbase上了，而且日志的时间也已经写到Mysql了，万事都已经具备了，接下来我们就要撸指标了，先从两个经典的指标开始撸。程序流程我们先理一下整个程序的计算流程，请看大图：
Spark+Hbase 亿级流量分析实战（小巧高性能的ETL）叫我不矜持
在上一篇文章大猪已经介绍了日志存储设计方案，我们数据已经落地到数据中心上了，那接下来如何ETL呢？毕竟可是生产环境级别的，可不能乱来。其实只要解决几个问题即可，不必要引入很大级别的组件来做，当然了各有各的千秋，本文主要从易懂、小巧、简洁、高性能这三个方面去设计出发点，顺便还实现了一个精巧的Filebeat。要实现的功能就是扫描每天的增量日志并写入Hbase中需要搞定下面几个不务正业的小老弟需要把文
Spark+Hbase 亿级流量分析实战（日志存储设计） dounine hbase spark
背景接着上篇文章百亿级流量实时分析统计-数据结构设计我们已经设计好了日志的结构，接下来我们就准备要开始撸代码了，我最喜欢这部分的环节了，可是一个上来连就撸代码的程序肯定不是好程序员，要不先设计设计流程图？那来吧！！！流程图设计一用户发起文章操作，发起请求日志日志将由SLB服务器进行负载到日志打点服务器。NSA将作为日志收集中心进行存储，也可以使用rsync把节点上的日志同步到日志中心。作为核心的E
Spark+Hbase 亿级流量分析实战（数据结构设计） dounine hbase spark
背景靠文章生存的大厂们简书/小红书/CSDN(PS:好吧你们仨记得给我广告费)，对优秀的文章进行大数据分析的工作必不可以少了，本系列文章将会从关于文章的各个维度进行实战分析，这系列文章新手可借着踏入大数据研发的大门，至于大数据的大佬们可以一起来相互伤害，至少为什么取名为‘‘百亿级流量实时分析统计’’看完后整个系列的文章你就知道了，相信大家都是会举一反三的孩子们。作者网名：大猪佩琪姓名：不晓年龄：不
IntelliJ IDEA构建基于maven的spark+hbase工程（scala语言）烫烫烫口 spark hbase idea scala 数据库
摘要利用IDEA来编写基于maven的scala程序，主要功能用来支持从hbase中拉取数据供spark进行mapreduce运算。软件准备首先下载安装IntelliJIDEAhttps://www.jetbrains.com/idea/download/#section=windows不需要javaee支持的话，直接选择Community版本就行了，毕竟免费，也足够支持maven,scala,g
Spark 批量写数据入HBase，spark数据入hbase 富的只剩下代码 spark
原文地址：http://www.bkjia.com/yjs/1010813.htmlSpark批量写数据入HBase，spark数据入hbase介绍工作中常常会遇到这种情形，需要将hdfs中的大批量数据导入HBase。本文使用Spark+HBase的方式将RDD中的数据导入HBase中。没有使用官网提供的newAPIHadoopRDD接口的方式。使用本文的方式将数据导入HBase,7000W条
Spark批量写数据入HBase Mrknowledge
＝＝＝＝＝＝转自：http://www.it165.net/admin/html/201506/5699.html＝＝＝＝＝＝介绍工作中常常会遇到这种情形，需要将hdfs中的大批量数据导入HBase。本文使用Spark+HBase的方式将RDD中的数据导入HBase中。没有使用官网提供的newAPIHadoopRDD接口的方式。使用本文的方式将数据导入HBase,7000W条数据，花费时间大概20
Maven Array_06 eclipse jdk maven
Maven Maven是基于项目对象模型(POM)，信息来管理项目的构建，报告和文档的软件项目管理工具。 Maven 除了以程序构建能力为特色之外，还提供高级项目管理工具。由于 Maven 的缺省构建规则有较高的可重用性，所以常常用两三行 Maven 构建脚本就可以构建简单的项目。由于 Maven 的面向项目的方法，许多 Apache Jakarta 项目发文时使用 Maven，而且公司
ibatis的queyrForList和queryForMap区别 bijian1013 java ibatis
一.说明 iBatis的返回值参数类型也有种：resultMap与resultClass，这两种类型的选择可以用两句话说明之： 1.当结果集列名和类的属性名完全相对应的时候，则可直接用resultClass直接指定查询结果类
LeetCode[位运算] - #191 计算汉明权重 Cwind java 位运算 LeetCode Algorithm 题解
原题链接：#191 Number of 1 Bits 要求：写一个函数，以一个无符号整数为参数，返回其汉明权重。例如，‘11’的二进制表示为'00000000000000000000000000001011', 故函数应当返回3。汉明权重：指一个字符串中非零字符的个数；对于二进制串，即其中‘1’的个数。难度：简单分析：将十进制参数转换为二进制，然后计算其中1的个数即可。 “
浅谈java类与对象 15700786134 java
java是一门面向对象的编程语言，类与对象是其最基本的概念。所谓对象，就是一个个具体的物体，一个人，一台电脑，都是对象。而类，就是对象的一种抽象，是多个对象具有的共性的一种集合，其中包含了属性与方法，就是属于该类的对象所具有的共性。当一个类创建了对象，这个对象就拥有了该类全部的属性，方法。相比于结构化的编程思路，面向对象更适用于人的思维
linux下双网卡同一个IP 被触发 linux
转自： http://q2482696735.blog.163.com/blog/static/250606077201569029441/ 由于需要一台机器有两个网卡，开始时设置在同一个网段的IP，发现数据总是从一个网卡发出，而另一个网卡上没有数据流动。网上找了下，发现相同的问题不少：一、关于双网卡设置同一网段IP然后连接交换机的时候出现的奇怪现象。当时没有怎么思考、以为是生成树
安卓按主页键隐藏程序之后无法再次打开肆无忌惮_ 安卓
遇到一个奇怪的问题，当SplashActivity跳转到MainActivity之后，按主页键，再去打开程序，程序没法再打开（闪一下），结束任务再开也是这样，只能卸载了再重装。而且每次在Log里都打印了这句话"进入主程序"。后来发现是必须跳转之后再finish掉SplashActivity 本来代码： // 销毁这个Activity fin
通过cookie保存并读取用户登录信息实例知了ing JavaScript html
通过cookie的getCookies()方法可获取所有cookie对象的集合；通过getName()方法可以获取指定的名称的cookie；通过getValue()方法获取到cookie对象的值。另外，将一个cookie对象发送到客户端，使用response对象的addCookie()方法。下面通过cookie保存并读取用户登录信息的例子加深一下理解。（1）创建index.jsp文件。在改
JAVA 对象池矮蛋蛋 java ObjectPool
原文地址： http://www.blogjava.net/baoyaer/articles/218460.html Jakarta对象池 ☆为什么使用对象池恰当地使用对象池化技术，可以有效地减少对象生成和初始化时的消耗，提高系统的运行效率。Jakarta Commons Pool组件提供了一整套用于实现对象池化
ArrayList根据条件+for循环批量删除的方法 alleni123 java
场景如下： ArrayList<Obj> list Obj-> createTime, sid. 现在要根据obj的createTime来进行定期清理。（释放内存） ------------------------- 首先想到的方法就是 for(Obj o:list){ if(o.createTime-currentT>xxx){
阿里巴巴“耕地宝”大战各种宝百合不是茶平台战略
“耕地保”平台是阿里巴巴和安徽农民共同推出的一个 “首个互联网定制私人农场”，“耕地宝”由阿里巴巴投入一亿，主要是用来进行农业方面，将农民手中的散地集中起来不仅加大农民集体在土地上面的话语权，还增加了土地的流通与利用率，提高了土地的产量，有利于大规模的产业化的高科技农业的发展，阿里在农业上的探索将会引起新一轮的产业调整，但是集体化之后农民的个体的话语权将更少，国家应出台相应的法律法规保护
Spring注入有继承关系的类（1） bijian1013 java spring
一个类一个类的注入 1.AClass类 package com.bijian.spring.test2; public class AClass { String a; String b; public String getA() { return a; } public void setA(Strin
30岁转型期你能否成为成功人士 bijian1013 成功
很多人由于年轻时走了弯路，到了30岁一事无成，这样的例子大有人在。但同样也有一些人，整个职业生涯都发展得很优秀，到了30岁已经成为职场的精英阶层。由于做猎头的原因，我们接触很多30岁左右的经理人，发现他们在职业发展道路上往往有很多致命的问题。在30岁之前，他们的职业生涯表现很优秀，但从30岁到40岁这一段，很多人
[Velocity三]基于Servlet+Velocity的web应用 bit1129 velocity
什么是VelocityViewServlet 使用org.apache.velocity.tools.view.VelocityViewServlet可以将Velocity集成到基于Servlet的web应用中，以Servlet+Velocity的方式实现web应用 Servlet + Velocity的一般步骤 1.自定义Servlet，实现VelocityViewServl
【Kafka十二】关于Kafka是一个Commit Log Service bit1129 service
Kafka is a distributed, partitioned, replicated commit log service.这里的commit log如何理解？ A message is considered "committed" when all in sync replicas for that partition have applied i
NGINX + LUA实现复杂的控制 ronin47 lua nginx 控制
安装lua_nginx_module 模块 lua_nginx_module 可以一步步的安装，也可以直接用淘宝的OpenResty Centos和debian的安装就简单了。。这里说下freebsd的安装： fetch http://www.lua.org/ftp/lua-5.1.4.tar.gz tar zxvf lua-5.1.4.tar.gz cd lua-5.1.4 ma
java-14.输入一个已经按升序排序过的数组和一个数字，在数组中查找两个数，使得它们的和正好是输入的那个数字 bylijinnan java
public class TwoElementEqualSum { /** * 第 14 题：题目：输入一个已经按升序排序过的数组和一个数字，在数组中查找两个数，使得它们的和正好是输入的那个数字。要求时间复杂度是 O(n) 。如果有多对数字的和等于输入的数字，输出任意一对即可。例如输入数组 1 、 2 、 4 、 7 、 11 、 15 和数字 15 。由于
Netty源码学习-HttpChunkAggregator-HttpRequestEncoder-HttpResponseDecoder bylijinnan java netty
今天看Netty如何实现一个Http Server org.jboss.netty.example.http.file.HttpStaticFileServerPipelineFactory： pipeline.addLast("decoder", new HttpRequestDecoder()); pipeline.addLast(&quo
java敏感词过虑-基于多叉树原理 cngolon 违禁词过虑替换违禁词敏感词过虑多叉树
基于多叉树的敏感词、关键词过滤的工具包，用于java中的敏感词过滤 1、工具包自带敏感词词库，第一次调用时读入词库，故第一次调用时间可能较长，在类加载后普通pc机上html过滤5000字在80毫秒左右，纯文本35毫秒左右。 2、如需自定义词库，将jar包考入WEB-INF工程的lib目录，在WEB-INF/classes目录下建一个 utf-8的words.dict文本文件，
多线程知识 cuishikuan 多线程
T1，T2，T3三个线程工作顺序，按照T1，T2，T3依次进行 public class T1 implements Runnable{ @Override
spring整合activemq dalan_123 java spring jms
整合spring和activemq需要搞清楚如下的东东1、ConnectionFactory分： a、spring管理连接到activemq服务器的管理ConnectionFactory也即是所谓产生到jms服务器的链接 b、真正产生到JMS服务器链接的ConnectionFactory还得
MySQL时间字段究竟使用INT还是DateTime？ dcj3sjt126com mysql
环境：Windows XPPHP Version 5.2.9MySQL Server 5.1 第一步、创建一个表date_test（非定长、int时间） CREATE TABLE `test`.`date_test` (`id` INT NOT NULL AUTO_INCREMENT ,`start_time` INT NOT NULL ,`some_content`
Parcel: unable to marshal value dcj3sjt126com marshal
在两个activity直接传递List<xxInfo>时，出现Parcel: unable to marshal value异常。在MainActivity页面（MainActivity页面向NextActivity页面传递一个List<xxInfo>）： Intent intent = new Intent(this, Next
linux进程的查看上（ps） eksliang linux ps linux ps -l linux ps aux
ps:将某个时间点的进程运行情况选取下来转载请出自出处：http://eksliang.iteye.com/admin/blogs/2119469 http://eksliang.iteye.com ps 这个命令的man page 不是很好查阅，因为很多不同的Unix都使用这儿ps来查阅进程的状态，为了要符合不同版本的需求，所以这个
为什么第三方应用能早于System的app启动 gqdy365 System
Android应用的启动顺序网上有一大堆资料可以查阅了，这里就不细述了，这里不阐述ROM启动还有bootloader，软件启动的大致流程应该是启动kernel -> 运行servicemanager 把一些native的服务用命令启动起来（包括wifi, power, rild, surfaceflinger, mediaserver等等）-> 启动Dalivk中的第一个进程Zygot
App Framework发送JSONP请求(3) hw1287789687 jsonp 跨域请求发送jsonp ajax请求越狱请求
App Framework 中如何发送JSONP请求呢? 使用jsonp,详情请参考:http://json-p.org/ 如何发送Ajax请求呢? (1)登录 /*** * 会员登录 * @param username * @param password */ var user_login=function(username,password){ // aler
发福利，整理了一份关于“资源汇总”的汇总 justjavac 资源
觉得有用的话，可以去github关注：https://github.com/justjavac/awesome-awesomeness-zh_CN 通用 free-programming-books-zh_CN 免费的计算机编程类中文书籍精彩博客集合 hacke2/hacke2.github.io#2 ResumeSample 程序员简历
用 Java 技术创建 RESTful Web 服务 macroli java 编程 Web REST
转载：http://www.ibm.com/developerworks/cn/web/wa-jaxrs/ JAX-RS (JSR-311) 【 Java API for RESTful Web Services 】是一种 Java™ API，可使 Java Restful 服务的开发变得迅速而轻松。这个 API 提供了一种基于注释的模型来描述分布式资源。注释被用来提供资源的位
CentOS6.5-x86_64位下oracle11g的安装详细步骤及注意事项超声波 oracle linux
前言：这两天项目要上线了，由我负责往服务器部署整个项目，因此首先要往服务器安装oracle，服务器本身是CentOS6.5的64位系统，安装的数据库版本是11g，在整个的安装过程中碰到很多的坑，不过最后还是通过各种途径解决并成功装上了。转别写篇博客来记录完整的安装过程以及在整个过程中的注意事项。希望对以后那些刚刚接触的菜鸟们能起到一定的帮助作用。安装过程中可能遇到的问题（注
HttpClient 4.3 设置keeplive 和 timeout 的方法 supben httpclient
ConnectionKeepAliveStrategy kaStrategy = new DefaultConnectionKeepAliveStrategy() { @Override public long getKeepAliveDuration(HttpResponse response, HttpContext context) { long keepAlive
Spring 4.2新特性-@Import注解的升级 wiselyman spring 4
3.1 @Import @Import注解在4.2之前只支持导入配置类在4.2,@Import注解支持导入普通的java类,并将其声明成一个bean 3.2 示例演示java类 package com.wisely.spring4_2.imp; public class DemoService { public void doSomethin