

为避免在流计算环境中频繁的以同步方式查询外部维表,Flink官方提供使用异步IO与外部系统并发的交互方式,这样可以减轻因为网络交互引起的系统吞吐和延迟问题。当然,为了避免频繁与外部系统进行交互,建议使用内部缓存的方式存储近期容易使用到的维度数据,也就是LRU(最近最少使用)思想,业界经常使用的一个缓存机制是Guava 库提供的 CacheBuilder。











这里使用缓存机制是Guava 库提供的 CacheBuilder。




一定要详细看一看java Docs,用法讲的很详细







   * Retrieves data from HBase.从 HBase 检索数据。
   * @param request The {@code get} request.
   * @return A deferred list of key-values that matched the get request.
   *         与 get 请求匹配的延迟键值列表。
  public Deferred<ArrayList<KeyValue>> get(final GetRequest request) {
    return sendRpcToRegion(request).addCallbacks(got, Callback.PASSTHROUGH);
   * Method to issue multiple get requests to HBase in a batch. This can avoid
   * bottlenecks in region clients and improve response time.
   * 批量向 HBase 发出多个 get 请求的方法。
   * 这可以避免区域客户端的瓶颈并提高响应时间。
   * @param requests A list of one or more GetRequests.
   *         requests 一个或多个 GetRequests 的列表。
   * @return A deferred grouping of result or exceptions. Note that this API may
   * return a DeferredGroupException if one or more calls failed.
   * 结果或异常的延迟分组。
   * 请注意,如果一个或多个调用失败,此 API 可能会返回 DeferredGroupException。
   * @since 1.8
  public Deferred<List<GetResultOrException>> get(final List<GetRequest> requests) {
    return Deferred.groupInOrder(multiGet(requests))
            new Callback<List<GetResultOrException>, ArrayList<GetResultOrException>>() {
              public List<GetResultOrException> call(ArrayList<GetResultOrException> results) {
                return results;


   * Constructor.
   * @param quorum_spec The specification of the quorum, e.g.
   * {@code "host1,host2,host3"}.
   *                    第一个参数指定Zookeeper地址
   * @param base_path The base path under which is the znode for the
   * -ROOT- region.
   *                   第二个参数执行port
  public HBaseClient(final String quorum_spec, final String base_path) {
    this(quorum_spec, base_path, defaultChannelFactory(new Config()));



这个是对于从HBase怎么获取数据的一种描述,无非就是指定 key 列蔟 列。
此处主要关注构造函数:通过如下几个构造函数,就能明白可以按照业务需求指定 key 列蔟 或者 列来获取数据

   * Constructor.
   * These byte arrays will NOT be copied.
   * @param table The non-empty name of the table to use.
   * @param key The row key to get in that table.
  public GetRequest(final byte[] table, final byte[] key) {
    super(table, key);
    this.bufferable = false; //don't buffer get request

   * Constructor.
   * @param table The non-empty name of the table to use.
   * @param key The row key to get in that table.
   * This byte array will NOT be copied.
  public GetRequest(final String table, final byte[] key) {
    this(table.getBytes(), key);

   * Constructor.
   * @param table The non-empty name of the table to use.
   * @param key The row key to get in that table.
  public GetRequest(final String table, final String key) {
    this(table.getBytes(), key.getBytes());

   * Constructor.
   * These byte arrays will NOT be copied.
   * @param table The non-empty name of the table to use.
   * @param key The row key to get in that table.
   * @param family The column family.
   * @since 1.5
  public GetRequest(final byte[] table,
                    final byte[] key,
                    final byte[] family) {
    super(table, key);
    this.bufferable = false; //don't buffer get request

   * Constructor.
   * @param table The non-empty name of the table to use.
   * @param key The row key to get in that table.
   * @param family The column family.
   * @since 1.5
  public GetRequest(final String table,
                    final String key,
                    final String family) {
    this(table, key);
    this.bufferable = false; //don't buffer get request

   * Constructor.
   * These byte arrays will NOT be copied.
   * @param table The non-empty name of the table to use.
   * @param key The row key to get in that table.
   * @param family The column family.
   * @param qualifier The column qualifier.
   * @since 1.5
  public GetRequest(final byte[] table,
                    final byte[] key,
                    final byte[] family,
                    final byte[] qualifier) {
    super(table, key);
    this.bufferable = false; //don't buffer get request

   * Constructor.
   * @param table The non-empty name of the table to use.
   * @param key The row key to get in that table.
   * @param family The column family.
   * @param qualifier The column qualifier.
   * @since 1.5
  public GetRequest(final String table,
                    final String key,
                    final String family,
                    final String qualifier) {
    this(table, key);
    this.bufferable = false; //don't buffer get request




public Deferred<List<GetResultOrException>> get(final List<GetRequest> requests)


  public ArrayList<KeyValue> getCells() {
    return this.cells;

  public Exception getException() {
    return this.exception;




  /** Returns the row key.  */
  public byte[] key() {
    return key;

  /** Returns the column family.  */
  public byte[] family() {
    return family;

  /** Returns the column qualifier.  */
  public byte[] qualifier() {
    return qualifier;

   * Returns the timestamp stored in this {@code KeyValue}.
   * @see #TIMESTAMP_NOW
  public long timestamp() {
    return timestamp;

  //public byte type() {
  //  return type;

  /** Returns the value, the contents of the cell.
   * 返回当前单元格的数据 */
  public byte[] value() {
    return value;

  public int compareTo(final KeyValue other) {
    int d;
    if ((d = Bytes.memcmp(key, other.key)) != 0) {
      return d;
    } else if ((d = Bytes.memcmp(family, other.family)) != 0) {
      return d;
    } else if ((d = Bytes.memcmp(qualifier, other.qualifier)) != 0) {
      return d;
    //} else if ((d = Bytes.memcmp(value, other.value)) != 0) {
    //  return d;
    } else if ((d = Long.signum(timestamp - other.timestamp)) != 0) {
      return d;
    } else {
    //  d = type - other.type;
      d = Bytes.memcmp(value, other.value);
    return d;

  public boolean equals(final Object other) {
    if (other == null || !(other instanceof KeyValue)) {
      return false;
    return compareTo((KeyValue) other) == 0;

  public int hashCode() {
    return Arrays.hashCode(key)
      ^ Arrays.hashCode(family)
      ^ Arrays.hashCode(qualifier)
      ^ Arrays.hashCode(value)
      ^ (int) (timestamp ^ (timestamp >>> 32))
      //^ type

  public String toString() {
    final StringBuilder buf = new StringBuilder(84  // Boilerplate + timestamp
      // the row key is likely to contain non-ascii characters, so
      // let's multiply its length by 2 to avoid re-allocations.
      + key.length * 2 + family.length + qualifier.length + value.length);
    Bytes.pretty(buf, key);
    buf.append(", family=");
    Bytes.pretty(buf, family);
    buf.append(", qualifier=");
    Bytes.pretty(buf, qualifier);
    buf.append(", value=");
    Bytes.pretty(buf, value);
    buf.append(", timestamp=").append(timestamp);
    //  .append(", type=").append(type);
    return buf.toString();




   * Registers a callback.
   * 注册回调。

* If the deferred result is already available and isn't an exception, the * callback is executed immediately from this thread. * If the deferred result is already available and is an exception, the * callback is discarded. * If the deferred result is not available, this callback is queued and will * be invoked from whichever thread gives this deferred its initial result * by calling {@link #callback}. * * 如果延迟结果已经可用并且不是异常,则立即从该线程执行回调。 * 如果延迟结果已经可用并且是异常,则丢弃回调。 * 如果延迟结果不可用,则此回调将排队,并将从通过调用 {@link #callback} * 为延迟提供其初始结果的任何线程调用。 * @param cb The callback to register. 要注册的回调。 * @return {@code this} with an "updated" type. */ public <R> Deferred<R> addCallback(final Callback<R, T> cb) { return addCallbacks(cb, Callback.PASSTHROUGH); }

   * Registers a callback and an "errback".
   * 注册一个回调和一个“errback”。

* If the deferred result is already available, the callback or the errback * (depending on the nature of the result) is executed immediately from this * thread. * 如果延迟结果已经可用,则立即从该线程执行回调或 errback(取决于结果的性质)。 * @param cb The callback to register.要注册的回调。 * @param eb Th errback to register.异常返回注册。 * @return {@code this} with an "updated" type. * @throws CallbackOverflowError if there are too many callbacks in this chain. * The maximum number of callbacks allowed in a chain is set by the * implementation. The limit is high enough that you shouldn't have to worry * about this exception (which is why it's an {@link Error} actually). If * you hit it, you probably did something wrong. */ @SuppressWarnings("unchecked") public <R, R2, E> Deferred<R> addCallbacks(final Callback<R, T> cb, final Callback<R2, E> eb) { if (cb == null) { throw new NullPointerException("null callback"); } else if (eb == null) { throw new NullPointerException("null errback"); } // We need to synchronize on `this' first before the CAS, to prevent // runCallbacks from switching our state from RUNNING to DONE right // before we add another callback. synchronized (this) { // If we're DONE, switch to RUNNING atomically. if (state == DONE) { // This "check-then-act" sequence is safe as this is the only code // path that transitions from DONE to RUNNING and it's synchronized. state = RUNNING; } else { // We get here if weren't DONE (most common code path) // -or- // if we were DONE and another thread raced with us to change the // state and we lost the race (uncommon). if (callbacks == null) { callbacks = new Callback[INIT_CALLBACK_CHAIN_SIZE]; } // Do we need to grow the array? else if (last_callback == callbacks.length) { final int oldlen = callbacks.length; if (oldlen == MAX_CALLBACK_CHAIN_LENGTH * 2) { throw new CallbackOverflowError("Too many callbacks in " + this + " (size=" + (oldlen / 2) + ") when attempting to add cb=" + cb + '@' + cb.hashCode() + ", eb=" + eb + '@' + eb.hashCode()); } final int len = Math.min(oldlen * 2, MAX_CALLBACK_CHAIN_LENGTH * 2); final Callback[] newcbs = new Callback[len]; System.arraycopy(callbacks, next_callback, // Outstanding callbacks. newcbs, 0, // Move them to the beginning. last_callback - next_callback); // Number of items. last_callback -= next_callback; next_callback = 0; callbacks = newcbs; } callbacks[last_callback++] = cb; callbacks[last_callback++] = eb; return (Deferred<R>) ((Deferred) this); } } // end of synchronized block if (!doCall(result instanceof Exception ? eb : cb)) { // While we were executing the callback, another thread could have // added more callbacks. If doCall returned true, it means we're // PAUSED, so we won't reach this point, because the Deferred we're // waiting on will call us back later. But if we're still in state // RUNNING, we'll get to here, and we must check to see if any new // callbacks were added while we were executing doCall, because if // there are, we must execute them immediately, because no one else // is going to execute them for us otherwise. boolean more; synchronized (this) { more = callbacks != null && next_callback != last_callback; } if (more) { runCallbacks(); // Will put us back either in DONE or in PAUSED. } else { state = DONE; } } return (Deferred<R>) ((Object) this); }




package com.scallion.transform;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.scallion.common.Common;
import com.stumbleupon.async.Callback;
import com.stumbleupon.async.Deferred;
import org.apache.commons.lang.StringUtils;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.async.ResultFuture;
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;
import org.hbase.async.GetRequest;
import org.hbase.async.GetResultOrException;
import org.hbase.async.HBaseClient;
import org.hbase.async.KeyValue;

import java.util.*;

 * created by gaowj.
 * created on 2021-07-16.
 * function: 异步关联维表函数
public class AsyncHBaseDimJoinFunction extends RichAsyncFunction<Object, Object> {
    private HBaseClient client;//HBase异步客户端
    private String rowKeyCol; //主键列名
    private HashMap<String, HashSet<String>> joinTables;//需要关联的表名及其字段
    private HashMap<String, String> colAndResCol;//map的key为维表列名,value为流量bean的列名

    public AsyncHBaseDimJoinFunction(String rowKeyCol, HashMap<String, HashSet<String>> joinTables, HashMap<String, String> colAndResCol) {
        this.rowKeyCol = rowKeyCol;
        this.joinTables = joinTables;
        this.colAndResCol = colAndResCol;

    public void open(Configuration parameters) throws Exception {
        ParameterTool params = (ParameterTool) getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
        client = new HBaseClient(params.getRequired("hbase.zookeeper.quorum"),

    public void asyncInvoke(Object bean, ResultFuture<Object> resultFuture) throws Exception {
        try {
            JSONObject beanJsonObj = JSON.parseObject(JSON.toJSONString(bean));
            String rowKey = beanJsonObj.getString(rowKeyCol);//主键列值
            ArrayList<GetRequest> getRequests = new ArrayList<>();
            Iterator<String> tables = joinTables.keySet().iterator();
            while (tables.hasNext()) {
                String table = tables.next();
                HashSet<String> cols = joinTables.get(table);//需要关联的列名
                Iterator<String> colsIterator = cols.iterator();
                while (colsIterator.hasNext()) {
                    String col = colsIterator.next();
                    getRequests.add(new GetRequest(table, rowKey,
            Deferred<List<GetResultOrException>> listDeferred = client.get(getRequests);
            listDeferred.addCallbacks(new Callback<Object, List<GetResultOrException>>() {
                public Object call(List<GetResultOrException> callBack) throws Exception {
                    if (callBack != null && !callBack.isEmpty()) {
                        Iterator<GetResultOrException> callBackIterator = callBack.iterator();
                        while (callBackIterator.hasNext()) {
                            GetResultOrException results = callBackIterator.next();
                            ArrayList<KeyValue> cells = results.getCells();
                            for (KeyValue kv : cells) {
                                String qualifier = new String(kv.qualifier());//维表列名
                                String v = new String(kv.value());
                                if (StringUtils.isNotBlank(v)) {
                                    String resCol = colAndResCol.get(qualifier);//流量日志bean的列名
                                    beanJsonObj.put(resCol, v);
                    } else {
                    return null;
            }, new Callback<Object, Object>() {
                public Object call(Object o) throws Exception {
                    return null;

        } catch (Exception ex) {


用于外部数据访问的异步 I/O
