standalone

细说HBase怎么完成一个Get操作 (client side)

源码解析基于HBase-0.20.6。

先看HTable类get()方法的code：

HTable.java

  /**
   * Extracts certain cells from a given row.
   * @param get The object that specifies what data to fetch and from which row.
   * @return The data coming from the specified row, if it exists.  If the row
   * specified doesn't exist, the {@link Result} instance returned won't
   * contain any {@link KeyValue}, as indicated by {@link Result#isEmpty()}.
   * @throws IOException if a remote or network exception occurs.
   * @since 0.20.0
   */
  public Result get(final Get get) throws IOException {
    return connection.getRegionServerWithRetries(
        new ServerCallable<Result>(connection, tableName, get.getRow()) {
          public Result call() throws IOException {
            return server.get(location.getRegionInfo().getRegionName(), get);
          }
        }
    );
  }

这段code 比较绕，但至少我们知道可以去查connection的getRegionServerWithRetries方法。那么connection是个什么东西呢？

这个玩意是定义在HTable里面的：

private final HConnection connection;

何时实例化的呢？在HTable的构造函数里面：

this.connection = HConnectionManager.getConnection(conf);

这个conf是一个HBaseConfiguration对象，是HTable构造函数的参数。OK，继续道HConnectionManager里面看看这个connection怎么来的吧：

HConnectionManager.java

  /**
   * Get the connection object for the instance specified by the configuration
   * If no current connection exists, create a new connection for that instance
   * @param conf
   * @return HConnection object for the instance specified by the configuration
   */
  public static HConnection getConnection(HBaseConfiguration conf) {
    TableServers connection;
    synchronized (HBASE_INSTANCES) {
      connection = HBASE_INSTANCES.get(conf);
      if (connection == null) {
        connection = new TableServers(conf);
        HBASE_INSTANCES.put(conf, connection);
      }
    }
    return connection;
  }

现在我们知道每一个conf对应一个connection，具体来说是TableServers类对象（实现了HConnection接口），所有的connections放在一个pool里。那么connection到底干嘛用呢？我们要看看HConnection这个接口的定义。

HConnection.java

/**
 * Cluster connection.
 * {@link HConnectionManager} manages instances of this class.
 */
public interface HConnection {
  /**
   * Retrieve ZooKeeperWrapper used by the connection.
   * @return ZooKeeperWrapper handle being used by the connection.
   * @throws IOException
   */
  public ZooKeeperWrapper getZooKeeperWrapper() throws IOException;

  /**
   * @return proxy connection to master server for this instance
   * @throws MasterNotRunningException
   */
  public HMasterInterface getMaster() throws MasterNotRunningException;

  /** @return - true if the master server is running */
  public boolean isMasterRunning();
  
  /**
   * Checks if <code>tableName</code> exists.
   * @param tableName Table to check.
   * @return True if table exists already.
   * @throws MasterNotRunningException
   */
  public boolean tableExists(final byte [] tableName)
  throws MasterNotRunningException;

  /**
   * A table that isTableEnabled == false and isTableDisabled == false
   * is possible. This happens when a table has a lot of regions
   * that must be processed.
   * @param tableName
   * @return true if the table is enabled, false otherwise
   * @throws IOException
   */
  public boolean isTableEnabled(byte[] tableName) throws IOException;
  
  /**
   * @param tableName
   * @return true if the table is disabled, false otherwise
   * @throws IOException
   */
  public boolean isTableDisabled(byte[] tableName) throws IOException;

  /**
   * @param tableName
   * @return true if all regions of the table are available, false otherwise
   * @throws IOException
   */
  public boolean isTableAvailable(byte[] tableName) throws IOException;

  /**
   * List all the userspace tables.  In other words, scan the META table.
   *
   * If we wanted this to be really fast, we could implement a special
   * catalog table that just contains table names and their descriptors.
   * Right now, it only exists as part of the META table's region info.
   *
   * @return - returns an array of HTableDescriptors 
   * @throws IOException
   */
  public HTableDescriptor[] listTables() throws IOException;
  
  /**
   * @param tableName
   * @return table metadata 
   * @throws IOException
   */
  public HTableDescriptor getHTableDescriptor(byte[] tableName)
  throws IOException;
  
  /**
   * Find the location of the region of <i>tableName</i> that <i>row</i>
   * lives in.
   * @param tableName name of the table <i>row</i> is in
   * @param row row key you're trying to find the region of
   * @return HRegionLocation that describes where to find the reigon in 
   * question
   * @throws IOException
   */
  public HRegionLocation locateRegion(final byte [] tableName,
      final byte [] row)
  throws IOException;
  
  /**
   * Find the location of the region of <i>tableName</i> that <i>row</i>
   * lives in, ignoring any value that might be in the cache.
   * @param tableName name of the table <i>row</i> is in
   * @param row row key you're trying to find the region of
   * @return HRegionLocation that describes where to find the reigon in 
   * question
   * @throws IOException
   */
  public HRegionLocation relocateRegion(final byte [] tableName,
      final byte [] row)
  throws IOException;  
  
  /** 
   * Establishes a connection to the region server at the specified address.
   * @param regionServer - the server to connect to
   * @return proxy for HRegionServer
   * @throws IOException
   */
  public HRegionInterface getHRegionConnection(HServerAddress regionServer)
  throws IOException;
  
  /** 
   * Establishes a connection to the region server at the specified address.
   * @param regionServer - the server to connect to
   * @param getMaster - do we check if master is alive
   * @return proxy for HRegionServer
   * @throws IOException
   */
  public HRegionInterface getHRegionConnection(
      HServerAddress regionServer, boolean getMaster)
  throws IOException;
  
  /**
   * Find region location hosting passed row
   * @param tableName
   * @param row Row to find.
   * @param reload If true do not use cache, otherwise bypass.
   * @return Location of row.
   * @throws IOException
   */
  HRegionLocation getRegionLocation(byte [] tableName, byte [] row,
    boolean reload)
  throws IOException;

  /**
   * Pass in a ServerCallable with your particular bit of logic defined and 
   * this method will manage the process of doing retries with timed waits 
   * and refinds of missing regions.
   *
   * @param <T> the type of the return value
   * @param callable
   * @return an object of type T
   * @throws IOException
   * @throws RuntimeException
   */
  public <T> T getRegionServerWithRetries(ServerCallable<T> callable) 
  throws IOException, RuntimeException;
  
  /**
   * Pass in a ServerCallable with your particular bit of logic defined and
   * this method will pass it to the defined region server.
   * @param <T> the type of the return value
   * @param callable
   * @return an object of type T
   * @throws IOException
   * @throws RuntimeException
   */
  public <T> T getRegionServerForWithoutRetries(ServerCallable<T> callable) 
  throws IOException, RuntimeException;
  
    
  /**
   * Process a batch of Puts. Does the retries.
   * @param list A batch of Puts to process.
   * @param tableName The name of the table
   * @return Count of committed Puts.  On fault, < list.size().
   * @throws IOException
   */
  public int processBatchOfRows(ArrayList<Put> list, byte[] tableName)
  throws IOException;

  /**
   * Process a batch of Deletes. Does the retries.
   * @param list A batch of Deletes to process.
   * @return Count of committed Deletes. On fault, < list.size().
   * @param tableName The name of the table
   * @throws IOException
   */
  public int processBatchOfDeletes(ArrayList<Delete> list, byte[] tableName)
  throws IOException;
}

上面的code是整个接口的定义，我们现在知道这玩意是封装了一些客户端查询处理请求，像put、delete这些封装在方法

public <T> T getRegionServerWithRetries(ServerCallable<T> callable) 里执行，put、delete等被封装在callable里面。这也就是为我们刚才在HTable.get()里看到的。

到这里要看TableServers.getRegionServerWithRetries(ServerCallable<T> callable)了，继续看code

public <T> T getRegionServerWithRetries(ServerCallable<T> callable) 
    throws IOException, RuntimeException {
      List<Throwable> exceptions = new ArrayList<Throwable>();
      for(int tries = 0; tries < numRetries; tries++) {
        try { 
	callable.instantiateServer(tries!=0); return callable.call();
              } catch (Throwable t) {
          t = translateException(t);
          exceptions.add(t);
          if (tries == numRetries - 1) {
            throw new RetriesExhaustedException(callable.getServerName(),
                callable.getRegionName(), callable.getRow(), tries, exceptions);
          }
        }
        try {
          Thread.sleep(getPauseTime(tries));
        } catch (InterruptedException e) {
          // continue
        }
      }
      return null;    
    }

比较核心的code就那两句，首先根据callable对象来完成一些定位ReginServer的工作，然后执行call来进行请求，这里要注意这个call方法是在最最最最开始的HTable.get里面的内部类里重写的。看ServerCallable类的一部分code：

public abstract class ServerCallable<T> implements Callable<T> {
  protected final HConnection connection;
  protected final byte [] tableName;
  protected final byte [] row;
  protected HRegionLocation location;
  protected HRegionInterface server;

  /**
   * @param connection
   * @param tableName
   * @param row
   */
  public ServerCallable(HConnection connection, byte [] tableName, byte [] row) {
    this.connection = connection;
    this.tableName = tableName;
    this.row = row;
  }
  
  /**
   * 
   * @param reload set this to true if connection should re-find the region
   * @throws IOException
   */
  public void instantiateServer(boolean reload) throws IOException {
    this.location = connection.getRegionLocation(tableName, row, reload);
    this.server = connection.getHRegionConnection(location.getServerAddress());
  }

所以一个ServerCallable对象包括tableName，row等，并且会通过构造函数传入一个connection引用，并且会调用该connection.getHRegionConnection方法来获取跟RegionServer打交道的一个handle（其实我也不知道称呼它啥了，不能叫connection吧，那就重复了，所以说HBase代码起的名字让我很ft，会误解）。

具体看怎么获得这个新玩意的：

HConnectinManager.java

   public HRegionInterface getHRegionConnection(
        HServerAddress regionServer, boolean getMaster) 
    throws IOException {
      if (getMaster) {
        getMaster();
      }
      HRegionInterface server;
      synchronized (this.servers) {
        // See if we already have a connection
        server = this.servers.get(regionServer.toString());
        if (server == null) { // Get a connection
          try {
            server = (HRegionInterface)HBaseRPC.waitForProxy(
                serverInterfaceClass, HBaseRPCProtocolVersion.versionID,
                regionServer.getInetSocketAddress(), this.conf, 
                this.maxRPCAttempts, this.rpcTimeout);
          } catch (RemoteException e) {
            throw RemoteExceptionHandler.decodeRemoteException(e);
          }
          this.servers.put(regionServer.toString(), server);
        }
      }
      return server;
    }

再挖下去看这个server怎么出来的（HBaseRPC类里面）：

  public static VersionedProtocol getProxy(Class<?> protocol,
      long clientVersion, InetSocketAddress addr, UserGroupInformation ticket,
      Configuration conf, SocketFactory factory)
  throws IOException {    
    VersionedProtocol proxy =
        (VersionedProtocol) Proxy.newProxyInstance(
            protocol.getClassLoader(), new Class[] { protocol },
            new Invoker(addr, ticket, conf, factory));
    long serverVersion = proxy.getProtocolVersion(protocol.getName(), 
                                                  clientVersion);
    if (serverVersion == clientVersion) {
      return proxy;
    }
    throw new VersionMismatch(protocol.getName(), clientVersion, 
                              serverVersion);
  }

这两部分code看出用到了java的动态代理机制，server是一个动态代理对象，实现了变量serverInterfaceClass指定的接口。在这里也就是HRegionInterface，也就是说server实现了该接口的内容。那么该接口定义哪些方法呢？

public interface HRegionInterface extends HBaseRPCProtocolVersion {
  /** 
   * Get metainfo about an HRegion
   * 
   * @param regionName name of the region
   * @return HRegionInfo object for region
   * @throws NotServingRegionException
   */
  public HRegionInfo getRegionInfo(final byte [] regionName)
  throws NotServingRegionException;
  

  /**
   * Return all the data for the row that matches <i>row</i> exactly, 
   * or the one that immediately preceeds it.
   * 
   * @param regionName region name
   * @param row row key
   * @param family Column family to look for row in.
   * @return map of values
   * @throws IOException
   */
  public Result getClosestRowBefore(final byte [] regionName,
    final byte [] row, final byte [] family)
  throws IOException;

  /**
   * 
   * @return the regions served by this regionserver
   */
  public HRegion [] getOnlineRegionsAsArray();
  
  /**
   * Perform Get operation.
   * @param regionName name of region to get from
   * @param get Get operation
   * @return Result
   * @throws IOException
   */
  public Result get(byte [] regionName, Get get) throws IOException;

  /**
   * Perform exists operation.
   * @param regionName name of region to get from
   * @param get Get operation describing cell to test
   * @return true if exists
   * @throws IOException
   */
  public boolean exists(byte [] regionName, Get get) throws IOException;

  /**
   * Put data into the specified region 
   * @param regionName
   * @param put the data to be put
   * @throws IOException
   */
  public void put(final byte [] regionName, final Put put)
  throws IOException;
  
  /**
   * Put an array of puts into the specified region
   * 
   * @param regionName
   * @param puts
   * @return The number of processed put's.  Returns -1 if all Puts
   * processed successfully.
   * @throws IOException
   */
  public int put(final byte[] regionName, final Put [] puts)
  throws IOException;

  /**
   * Deletes all the KeyValues that match those found in the Delete object, 
   * if their ts <= to the Delete. In case of a delete with a specific ts it
   * only deletes that specific KeyValue.
   * @param regionName
   * @param delete
   * @throws IOException
   */
  public void delete(final byte[] regionName, final Delete delete)
  throws IOException;

  /**
   * Put an array of deletes into the specified region
   * 
   * @param regionName
   * @param deletes
   * @return The number of processed deletes.  Returns -1 if all Deletes
   * processed successfully.
   * @throws IOException
   */
  public int delete(final byte[] regionName, final Delete [] deletes)
  throws IOException;

  /**
   * Atomically checks if a row/family/qualifier value match the expectedValue.
   * If it does, it adds the put.
   * 
   * @param regionName
   * @param row
   * @param family
   * @param qualifier
   * @param value the expected value
   * @param put
   * @throws IOException
   * @return true if the new put was execute, false otherwise
   */
  public boolean checkAndPut(final byte[] regionName, final byte [] row, 
      final byte [] family, final byte [] qualifier, final byte [] value,
      final Put put)
  throws IOException;
  
  /**
   * Atomically increments a column value. If the column value isn't long-like,
   * this could throw an exception.
   * 
   * @param regionName
   * @param row
   * @param family
   * @param qualifier
   * @param amount
   * @param writeToWAL whether to write the increment to the WAL
   * @return new incremented column value
   * @throws IOException
   */
  public long incrementColumnValue(byte [] regionName, byte [] row, 
      byte [] family, byte [] qualifier, long amount, boolean writeToWAL)
  throws IOException;
  
  
  //
  // remote scanner interface
  //

  /**
   * Opens a remote scanner with a RowFilter.
   * 
   * @param regionName name of region to scan
   * @param scan configured scan object
   * @return scannerId scanner identifier used in other calls
   * @throws IOException
   */
  public long openScanner(final byte [] regionName, final Scan scan)
  throws IOException;
  
  /**
   * Get the next set of values
   * @param scannerId clientId passed to openScanner
   * @return map of values; returns null if no results.
   * @throws IOException
   */
  public Result next(long scannerId) throws IOException;
  
  /**
   * Get the next set of values
   * @param scannerId clientId passed to openScanner
   * @param numberOfRows the number of rows to fetch
   * @return Array of Results (map of values); array is empty if done with this
   * region and null if we are NOT to go to the next region (happens when a
   * filter rules that the scan is done).
   * @throws IOException
   */
  public Result [] next(long scannerId, int numberOfRows) throws IOException;
  
  /**
   * Close a scanner
   * 
   * @param scannerId the scanner id returned by openScanner
   * @throws IOException
   */
  public void close(long scannerId) throws IOException;

  /**
   * Opens a remote row lock.
   *
   * @param regionName name of region
   * @param row row to lock
   * @return lockId lock identifier
   * @throws IOException
   */
  public long lockRow(final byte [] regionName, final byte [] row)
  throws IOException;

  /**
   * Releases a remote row lock.
   *
   * @param regionName
   * @param lockId the lock id returned by lockRow
   * @throws IOException
   */
  public void unlockRow(final byte [] regionName, final long lockId)
  throws IOException;
  
  
  /**
   * Method used when a master is taking the place of another failed one.
   * @return All regions assigned on this region server
   * @throws IOException
   */
  public HRegionInfo[] getRegionsAssignment() throws IOException;
  
  /**
   * Method used when a master is taking the place of another failed one.
   * @return The HSI
   * @throws IOException
   */
  public HServerInfo getHServerInfo() throws IOException;
}

可以看出HRegionInterface是定义了具体的向RegionServer查询的方法。

现在回过头来，当server这个动态代理对象实例化后，经过ServerCallable.call() 最后会调到server.get()。按照java的代理机制，又会传递到我们在构造这个动态代理对象时候传进去的new Invoker(addr, ticket, conf, factory))对象去执行具体的方法。

简单的说，这个Invoker对象使用HBase的RPC客户端跟RegionServer通信完成请求以及结果接收等等。

看看这个RPC客户端长什么样吧：

public Invoker(InetSocketAddress address, UserGroupInformation ticket, 
                   Configuration conf, SocketFactory factory) {
      this.address = address;
      this.ticket = ticket;
      this.client = CLIENTS.getClient(conf, factory); //client就是RPC客户端
    }

这个client是HBaseClient类的对象，这个HBaseClient类就是HBase中用来做RPC的客户端类。在这里HBaseClient也做了一个pool机制，不理解。。。code里面的注释如下：

      // Construct & cache client. The configuration is only used for timeout,
      // and Clients have connection pools. So we can either (a) lose some
      // connection pooling and leak sockets, or (b) use the same timeout for all
      // configurations. Since the IPC is usually intended globally, not
      // per-job, we choose (a).

继续说下去，看这么一个client怎么完成最后的请求：

 public Writable call(Writable param, InetSocketAddress addr, 
                       UserGroupInformation ticket)  
                       throws IOException {
    Call call = new Call(param);
    Connection connection = getConnection(addr, ticket, call);
    connection.sendParam(call);                 // send the parameter
    boolean interrupted = false;
    synchronized (call) {
      while (!call.done) {
        try {
          call.wait();                           // wait for the result
        } catch (InterruptedException ie) {
          // save the fact that we were interrupted
          interrupted = true;
        }
      }

      if (interrupted) {
        // set the interrupt flag now that we are done waiting
        Thread.currentThread().interrupt();
      }

      if (call.error != null) {
        if (call.error instanceof RemoteException) {
          call.error.fillInStackTrace();
          throw call.error;
        }
        // local exception
        throw wrapException(addr, call.error);
      }
      return call.value;
    }
  }

又见connection，这次的connection可是用来发送接收数据用的thread了。从getConnection(addr, ticket, call)推断又是一个pool，果不其然：

 /** Get a connection from the pool, or create a new one and add it to the
   * pool.  Connections to a given host/port are reused. */
  private Connection getConnection(InetSocketAddress addr, 
                                   UserGroupInformation ticket,
                                   Call call)
                                   throws IOException {
    if (!running.get()) {
      // the client is stopped
      throw new IOException("The client is stopped");
    }
    Connection connection;
    /* we could avoid this allocation for each RPC by having a  
     * connectionsId object and with set() method. We need to manage the
     * refs for keys in HashMap properly. For now its ok.
     */
    ConnectionId remoteId = new ConnectionId(addr, ticket);
    do {
      synchronized (connections) {
        connection = connections.get(remoteId);
        if (connection == null) {
          connection = new Connection(remoteId);
          connections.put(remoteId, connection);
        }
      }
    } while (!connection.addCall(call));
    
    //we don't invoke the method below inside "synchronized (connections)"
    //block above. The reason for that is if the server happens to be slow,
    //it will take longer to establish a connection and that will slow the
    //entire system down.
    connection.setupIOstreams();
    return connection;
  }

也就是说，只要所要查询的RegionServer的addr和用户组信息一样，就会共享一个connection。connection拿到后会将当前call放进自己内部的一个队列里（维护着call的id=》call的一个映射），当call完成后会更新call的状态（主要是否完成这么一个标志Call.done以及将请求结果填充在Call.value里）。

好了现在的情形是，现在看connection如何发送请求数据。

  /** Initiates a call by sending the parameter to the remote server.
     * Note: this is not called from the Connection thread, but by other
     * threads.
     * @param call
     */
    public void sendParam(Call call) {
      if (shouldCloseConnection.get()) {
        return;
      }

      DataOutputBuffer d=null;
      try {
        synchronized (this.out) {
          if (LOG.isDebugEnabled())
            LOG.debug(getName() + " sending #" + call.id);
          
          //for serializing the
          //data to be written
          d = new DataOutputBuffer();
          d.writeInt(call.id);
          call.param.write(d);
          byte[] data = d.getData();
          int dataLength = d.getLength();
          out.writeInt(dataLength);      //first put the data length
          out.write(data, 0, dataLength);//write the data
          out.flush();
        }
      } catch(IOException e) {
        markClosed(e);
      } finally {
        //the buffer is just an in-memory buffer, but it is still polite to
        // close early
        IOUtils.closeStream(d);
      }
    }

从code里面看出，请求发送是synchronized，所以会有上一篇日志里提到的问题。

HBase客户端的code先看到这里吧。

下面这个图帮助理解一下上面各种pool

Zookeeper（67） Zookeeper在HBase中的应用是什么？辞暮尔尔-烟火年年微服务 zookeeper hbase python
Zookeeper在HBase中起到了至关重要的作用，主要用于协调和管理HBase集群中的多个组件。具体来说，Zookeeper在HBase中的应用包括以下几个方面：Master选举：HBase集群中可以有多个Master节点，但只有一个处于Active状态，其余为Standby状态。Zookeeper用于进行Master节点的选举。RegionServer协调：Zookeeper用于管理和协调R
深入HBase——核心组件黄雪超大数据基础 #深入HBase hbase 数据库数据结构
引入通过上一篇对HBase核心算法和数据结构的梳理，我们对于其底层设计有了更多理解。现在我们从引入篇里面提到的HBase架构出发，去看看其中不同组件是如何设计与实现。核心组件首先，需要提到的就是HBase架构中会依赖到的Zookeeper和HDFS。对于HDFS看过深入HDFS的小伙伴，应该都不陌生，它提供了高可靠的海量数据存储和读写能力；而对于Zookeeper，它是一个分布式协调存储服务，主要
大数据-257 离线数仓 - 数据质量监控监控方法 Griffin架构 m0_74823705 面试学习路线阿里巴巴大数据架构
点一下关注吧！！！非常感谢！！持续更新！！！Java篇开始了！目前开始更新MyBatis，一起深入浅出！目前已经更新到了：Hadoop（已更完）HDFS（已更完）MapReduce（已更完）Hive（已更完）Flume（已更完）Sqoop（已更完）Zookeeper（已更完）HBase（已更完）Redis（已更完）Kafka（已更完）Spark（已更完）Flink（已更完）ClickHouse（已
Trae 项目常见问题解决方案强和毓Hadley
Trae项目常见问题解决方案trae:postbox:MinimalisticFetchbasedHTTPclient项目地址:https://gitcode.com/gh_mirrors/tr/trae项目基础介绍Trae是一个基于FetchAPI的极简HTTP客户端，旨在提供一个简单、轻量级的HTTP请求工具。该项目的主要编程语言是TypeScript和JavaScript。Trae的设计理念
Hbase深入浅出天才之上数据存储 Hbase 大数据存储
目录HBase在大数据生态圈中的位置HBase与传统关系数据库的区别HBase相关的模块以及HBase表格的特性HBase的使用建议Phoenix的使用总结HBase在大数据生态圈中的位置提到大数据的存储，大多数人首先联想到的是Hadoop和Hadoop中的HDFS模块。大家熟知的Spark、以及Hadoop的MapReduce，可以理解为一种计算框架。而HDFS，我们可以认为是为计算框架服务的存
深入浅出了解HBase及RDD编程山海王子大数据 hbase
深入浅出了解HBaseHBase简介架构HBase是什么样的数据库？关键是数据模型关键要素：什么是单元格时间戳的功能是什么？HBase为什么能存储海量数据创建一个HBase表配置Spark编写程序读取HBase数据编写程序向HBase写入数据关于搭建HBase高可用集群的图文教程，可参考我的另一篇博文——安装并配置HBase集群（5个节点）。HBase简介HBase是GoogleBigTable的
HBase简介：高效分布式数据存储和处理代码指四方分布式 hbase 数据库大数据
HBase简介：高效分布式数据存储和处理HBase是一个高效的、可扩展的分布式数据库，它是构建在ApacheHadoop之上的开源项目。HBase的设计目标是为大规模数据存储和处理提供高吞吐量和低延迟的解决方案。它可以在成百上千台服务器上运行，并能够处理海量的结构化和半结构化数据。HBase的核心特点包括：分布式存储：HBase使用Hadoop分布式文件系统（HDFS）作为底层存储，数据被分布在集
HBase简介梦醒沉醉 Hadoop hbase 数据库大数据
目录1.HBase概述2.HBase核心概念2.1行关键字2.2列关键字2.3时间戳2.4单元2.4.1HBase和RDBMS的差异2.4.2HBase组成3.HBase流程3.1Region的分配3.2RegionServer上线3.3RegionServer下线3.4Master上线3.5Master下线3.6写请求处理参考1.HBase概述 HBase是NoSQL(NotOnlySQL，泛
MongoDB面试题答案解析 HappyAcmen java面试题集 mongodb 数据库
文章目录一、概念理解类1.什么是MongoDB？2.NoSQL数据库是什么意思？NoSQL与RDBMS有什么区别？为什么要使用和不使用NoSQL数据库？3.MySQL与MongoDB之间最基本的差别是什么？4.你怎么比较MongoDB、CouchDB及CouchBase？5.MongoDB成为最好的NoSQL数据库的原因是什么？6.journal回放在条目(entry)不完整时会遇到问题吗？7.分
HBase基本技巧：掌握高效数据管理的秘诀狮歌~资深攻城狮 java android 数据库
HBase基本技巧：掌握高效数据管理的秘诀嘿，小伙伴们！现在你已经对HBase有了初步的了解，接下来让我们深入探讨一些HBase的基本技巧。这些技巧不仅能帮助你更高效地管理和操作数据，还能让你在面对复杂场景时游刃有余。1.行键设计的艺术什么是行键？行键（RowKey）是HBase表中每一行的唯一标识符。它的设计直接影响到查询性能和数据分布。因此，合理设计行键是非常重要的。设计原则•避免热点问题：如
【动态路由】系统Web URL资源整合系列（后端技术实现）【apisix实现】飞火流星02027 URL整合 apisix反向代理 apisix网关 apisix实现web资源整合系统URL资源整合 apisix基于请求参数的路由 apisix基于请求头的路由 APISIXDashboard
需求说明软件功能需求：反向代理功能（描述：apollo、eureka控、apisix、sentinel、普米、kibana、timetask、grafana、hbase、skywalking-ui、pinpoint、cmak界面、kafka-map、nacos、gateway、elasticsearch、oa-portal业务应用等多个web资源等只能通过有限个代理地址访问），不考虑SSO。软件质
大数据之-hdfs+hive+hbase+kudu+presto集群(6节点) 管哥的运维私房菜大数据 hdfs hive kudu presto hbase
几个主要软件的下载地址：prestohttps://prestosql.io/docs/current/index.htmlkudurpm包地址https://github.com/MartinWeindel/kudu-rpm/releaseshivehttp://mirror.bit.edu.cn/apache/hive/hdfshttp://archive.apache.org/dist/ha
hbase快照同步到目标集群出现ERROR Multiple regions have the same startkey问题分析 spring208208 hbase hbase python 前端
问题现象源集群表split/merge过程中创建快照，该快照同步到目标集群，目标集群恢复快照后，进行hbck检查，就会出现异常报错：ERRORMultipleregionshavethesamestartkey;问题分析首先，出现上述问题可能有如下两种原因：源集群中snapshot表本身就存在这种问题，没有修复就执行snapshot，导出快照到目标集群，然后恢复表也会存在这种问题。在执行split
hive spark读取hive hbase外表报错分析和解决 spring208208 hive hive spark hbase
问题现象使用Sparkshell操作hive关联Hbase的外表导致报错；hive使用tez引擎操作关联Hbase的外表时报错。问题1：使用tez或spark引擎，在hive查询时只要关联hbase的hive表就会有问题其他表正常。“org.apache.hadoop.hbase.client.RetriesExhaustedException:Can’tgetthelocations”问题2：s
HBase的合并操作 b1gx HBase
compact的作用flush操作会将memstore的数据落地为一个个StoreFile（HFile），那么随着时间的增长在HDFS上面就会有很多的HFile文件，这样对读操作会产生比较大的影响（读操作会对HFile进行归并查询），并且对DataNode的压力也会比较大。为了降低对读操作的影响，可以对这些HFile进行compact操作，但是compact操作会产生大量的IO，所以可以看出com
hbase 刷新队列和文件描述符过高告警 battybaby hbase 数据库 database
5节点的hbase配置为虚机，8c16g，时常出现刷新队列和文件描述符过高告警，如下图，文件描述符的告警不会自动恢复基本上每周重启一次RegionServer，重启后刷新队列会降下来。1：刷新队列高是因为小文件比较多2：文件描述符高是因为打开的文件比较多处理方法：拟合并hbase文件，后面合并了再更新看看HBASE手动触发major_compact_刘本龙的专栏-CSDN博客_hbasemajor
HBASE面试技巧狮歌~资深攻城狮 hbase 大数据
准备HBase面试时，除了熟悉常见的面试题及其答案外，还需要掌握一些面试技巧，以更好地展示你的技能和知识。以下是一些有效的HBase面试技巧和策略：1.深入理解核心概念技术深度•倒排索引：虽然HBase本身不使用倒排索引（这是搜索引擎如Elasticsearch使用的），但理解这一概念有助于对比不同存储系统的特性。•列族设计：列族的设计对性能有很大影响，确保你能够解释如何选择合适的列族数量、块大小
如何学习HBase：从入门到精通的完整指南狮歌~资深攻城狮 hbase 大数据
如何学习HBase：从入门到精通的完整指南嘿，小伙伴们！如果你对大数据存储感兴趣，并且想要掌握HBase这一强大的分布式数据库，那么你来对地方了！本文将为你提供一个系统的学习路径，帮助你从零开始逐步深入理解HBase。1.基础知识准备1.1理解NoSQL数据库在开始学习HBase之前，建议先了解一下NoSQL数据库的基本概念和分类。NoSQL数据库与传统的关系型数据库（如MySQL）有很大的不同，
深入HBase——引入黄雪超大数据基础 #深入HBase 大数据数据库 hbase
引入前面我们通过深入HDFS到深入MapReduce，从设计和落地，去深入了解了大数据最底层的基石——存储与计算是如何实现的。这个专栏则开始来看大数据的三驾马车中最后一个。通过前面我们对于GFS和MapReduce论文实现的了解，我们知道GFS在数据写入时，只对顺序写入有比较弱的一致性保障，而对于数据读取，虽然GFS支持随机读取，但在当时的硬件条件下，实际上也是支撑不了真正的高并发读取的；此外，M
时序数据库技术体系 – 初识InfluxDB（原理） weixin_30622181 数据库大数据系统架构
原贴地址：http://hbasefly.com/2017/12/08/influxdb-1/?qytefg=c4ft23在上篇文章《时序数据库体系技术–时序数据存储模型设计》中笔者分别介绍了多种时序数据库在存储模型设计上的一些考虑，其中OpenTSDB基于HBase对维度值进行了全局字典编码优化，Druid采用列式存储并实现了Bitmap索引以及局部字典编码优化，InfluxDB和Beringe
分布式存储--大规模订单架构设计梦江河大数据分布式订单系统大数据
架构一：MySQL+HBase+ElasticsearchMySQL存储实时订单，HBase存储历史订单，Elasticsearch实现订单的多维度搜索。架构复杂，运维维护成本高架构二：MySQL+TablestoreTablestore其实是HBase+Elasticsearch，既能存储大量数据，也能全文搜索架构三：分布式数据库TiDB+全文搜索功能参考文章
蓝易云 - HBase基础知识蓝易云 hbase 数据库大数据 php python 人工智能
HBase是一个分布式、可伸缩、列式存储的NoSQL数据库，它建立在Hadoop的HDFS之上，提供高可靠性、高性能的数据存储和访问。以下是HBase的基础知识：数据模型：HBase以表的形式存储数据，每个表由行和列组成，可以动态添加列族。每行由唯一的行键标识，列族和列限定符（Qualifier）用于唯一标识列。架构：HBase采用分布式架构，数据被分散存储在多个RegionServer上，每个R
线上HBase client返回超时异常分析 HBase callTimeout=60000 spring208208 大数据组件线上问题分析 hbase 数据库大数据
问题现象HBaseclient直接返回超时异常HBasecallTimeout=60000,callDuration=60301:row‘12649160863966c2790195059018040900010003320’ontable‘Z_UPA’atregion=Z_UPA,1213d1a56,1184027415643.ba7224f83dbb09591a74b7059f17.,host
unable to create new native thread异常分析 spring208208 hadoop java 开发语言
问题现象HBase的RegionServer服务以及对应的节点均出现了OOM现象。在DataNode服务节点上信息如下：ERROR:DataNodeisoutofmemory.Willretryin30seconds.unabletocreatenewnativethread问题分析根据反馈的信息，程序报unabletocreatenewnativethread异常其中java.lang.OutO
Hbase 倒霉男孩 hadoop hbase 数据库大数据 hadoop
文章目录认识HBase分布式数据库7.1.1什么是HBase1.HBase的起源2.HBase的特点3.HBase与传统关系数据库的区别7.1.2了解HBase系统架构1.ZooKeeper2.HMaster3.HRegionServer4.HLog7.1.3了解HBase数据模型1.行键2.列族3.列4.时间戳7.1.4了解HBase读/写流程1.写流程2.读流程认识HBase分布式数据库1.与
面试官：如何在千万级数据中查询 10W 的数据，都有什么方案？搬山道猿 java spring spring boot
前言在开发中遇到一个业务诉求，需要在千万量级的底池数据中筛选出不超过10W的数据，并根据配置的权重规则进行排序、打散（如同一个类目下的商品数据不能连续出现3次）。下面对该业务诉求的实现，设计思路和方案优化进行介绍，对「千万量级数据中查询10W量级的数据」设计了如下方案多线程+CK翻页方案ESscrollscan深翻页方案ES+Hbase组合方案RediSearch+RedisJSON组合方案初版设
基于苏宁易购模式设计的“凌优购“电商APP开发方案，融合O2O新零售特点，采用Spring Cloud+Flutter+Elasticsearch技术栈：星糖曙光后端语言（node javascript vue等等）spring cloud flutter elasticsearch react.js redis
以下是一个基于苏宁易购模式设计的"凌优购"电商APP开发方案，融合O2O新零售特点，采用SpringCloud+Flutter+Elasticsearch技术栈：一、系统架构设计用户端(Flutter跨平台)商家端(React管理后台)配送端(AndroidSDK)↑↑↑API网关(SpringCloudGateway)↑微服务集群(商品/订单/会员/营销)↑数据中台(HBase+Elastics
python操作hbase创建表（一）金融小白数据分析之路大数据 hbase python 数据库
python通过thrift来操作hbase在开发环境安装python库pipinstallthriftpipinstallhbase-thrifthbase中需要开启hbase-daemon.shstartthrifthadoop、hbse、Zookeeper都需要开启fromhbaseimportHbasefromthrift.transportimportTSocketfromhbase.t
【大数据入门核心技术-HBase】（七）HBase Python API 操作 forest_long 大数据技术入门到21天通关 hbase 大数据 hadoop zookeeper hive
5)单元格（Cell）每一个行键、列族、列标识共同确定一个单元格，单元格的内容没有特定的数据类型，以二进制字节来存储。每个单元格保存着同一份数据的多个版本，不同时间版本的数据按照时间先后顺序排序，最新的数据排在最前面。单元格可以用元组来进行访问。6)时间戳（Timestamp）在默认情况下，每一个单元格插入数据时都会用时间戳来进行版本标识。读取单元格数据时，如果时间戳没有被指定，则默认返回最新的数
建议收藏】2024年技术前沿——数据库分类及其在具体业务场景中的应用今晚务必早点睡面试必备架构设计运维数据库分类数据挖掘
下面是对数据库类型及其具体业务场景的详细讲解：文章目录1.关系型数据库(RDBMS)1.1.MySQL1.2.PostgreSQL1.3.Oracle1.4.SQLServer2.非关系型数据库(NoSQL)2.1.MongoDB2.2.Cassandra2.3.Redis2.4.Couchbase3.图数据库3.1.Neo4j3.2.ArangoDB4.时间序列数据库4.1.InfluxDB4.
windows下源码安装golang 616050468 golang安装 golang环境 windows
系统： 64位win7，开发环境：sublime text 2， go版本： 1.4.1 1. 安装前准备(gcc, gdb, git) golang在64位系
redis批量删除带空格的key bylijinnan redis
redis批量删除的通常做法： redis-cli keys "blacklist*" | xargs redis-cli del 上面的命令在key的前后没有空格时是可以的，但有空格就不行了： $redis-cli keys "blacklist*" 1) "blacklist:12: 361942420@qq.com
oracle正则表达式的用法 0624chenhong oracle 正则表达式
方括号表达示方括号表达式描述 [[:alnum:]] 字母和数字混合的字符 [[:alpha:]] 字母字符 [[:cntrl:]] 控制字符 [[:digit:]] 数字字符 [[:graph:]] 图像字符 [[:lower:]] 小写字母字符 [[:print:]] 打印字符 [[:punct：]] 标点符号字符 [[:space:]]
2048源码(核心算法有，缺少几个anctionbar，以后补上) 不懂事的小屁孩 2048
2048游戏基本上有四部分组成， 1：主activity，包含游戏块的16个方格，上面统计分数的模块 2：底下的gridview，监听上下左右的滑动，进行事件处理， 3：每一个卡片，里面的内容很简单，只有一个text，记录显示的数字 4：Actionbar，是游戏用重新开始，设置等功能(这个在底下可以下载的代码里面还没有实现) 写代码的流程 1：设计游戏的布局，基本是两块，上面是分
jquery内部链式调用机理换个号韩国红果果 JavaScript jquery
只需要在调用该对象合适(比如下列的setStyles)的方法后让该方法返回该对象（通过this 因为一旦一个函数称为一个对象方法的话那么在这个方法内部this（结合下面的setStyles）指向这个对象） function create(type){ var element=document.createElement(type); //this=element;
你订酒店时的每一次点击背后都是NoSQL和云计算蓝儿唯美 NoSQL
全球最大的在线旅游公司Expedia旗下的酒店预订公司，它运营着89个网站，跨越68个国家，三年前开始实验公有云，以求让客户在预订网站上查询假期酒店时得到更快的信息获取体验。云端本身是用于驱动网站的部分小功能的，如搜索框的自动推荐功能，还能保证处理Hotels.com服务的季节性需求高峰整体储能。 Hotels.com的首席技术官Thierry Bedos上个月在伦敦参加“2015 Clou
java笔记1 a-john java
1，面向对象程序设计（Object-oriented Propramming，OOP）：java就是一种面向对象程序设计。 2，对象：我们将问题空间中的元素及其在解空间中的表示称为“对象”。简单来说，对象是某个类型的实例。比如狗是一个类型，哈士奇可以是狗的一个实例，也就是对象。 3，面向对象程序设计方式的特性： 3.1 万物皆为对象。
C语言 sizeof和strlen之间的那些事 C/C++软件开发求职面试题必备考点（一） aijuans C/C++求职面试必备考点
找工作在即，以后决定每天至少写一个知识点，主要是记录，逼迫自己动手、总结加深印象。当然如果能有一言半语让他人收益，后学幸运之至也。如有错误，还希望大家帮忙指出来。感激不尽。后学保证每个写出来的结果都是自己在电脑上亲自跑过的，咱人笨，以前学的也半吊子。很多时候只能靠运行出来的结果再反过来
程序员写代码时就不要管需求了吗？ asia007 程序员不能一味跟需求走
编程也有2年了，刚开始不懂的什么都跟需求走，需求是怎样就用代码实现就行，也不管这个需求是否合理，是否为较好的用户体验。当然刚开始编程都会这样，但是如果有了2年以上的工作经验的程序员只知道一味写代码，而不在写的过程中思考一下这个需求是否合理，那么，我想这个程序员就只能一辈写敲敲代码了。我的技术不是很好，但是就不代
Activity的四种启动模式百合不是茶 android 栈模式启动 Activity的标准模式启动栈顶模式启动单例模式启动
android界面的操作就是很多个activity之间的切换,启动模式决定启动的activity的生命周期 ; 启动模式xml中配置 <activity android:name=".MainActivity" android:launchMode="standard&quo
Spring中@Autowired标签与@Resource标签的区别 bijian1013 java spring @Resource @Autowired @Qualifier
Spring不但支持自己定义的@Autowired注解，还支持由JSR-250规范定义的几个注解，如：@Resource、 @PostConstruct及@PreDestroy。 1. @Autowired @Autowired是Spring 提供的，需导入 Package:org.springframewo
Changes Between SOAP 1.1 and SOAP 1.2 sunjing Changes Enable SOAP 1.1 SOAP 1.2
JAX-WS SOAP Version 1.2 Part 0: Primer (Second Edition) SOAP Version 1.2 Part 1: Messaging Framework (Second Edition) SOAP Version 1.2 Part 2: Adjuncts (Second Edition) Which style of WSDL
【Hadoop二】Hadoop常用命令 bit1129 hadoop
以Hadoop运行Hadoop自带的wordcount为例， hadoop脚本位于/home/hadoop/hadoop-2.5.2/bin/hadoop，需要说明的是，这些命令的使用必须在Hadoop已经运行的情况下才能执行 Hadoop HDFS相关命令 hadoop fs -ls 列出HDFS文件系统的第一级文件和第一级
java异常处理（初级）白糖_ java DAO spring 虚拟机 Ajax
从学习到现在从事java开发一年多了，个人觉得对java只了解皮毛，很多东西都是用到再去慢慢学习，编程真的是一项艺术，要完成一段好的代码，需要懂得很多。最近项目经理让我负责一个组件开发，框架都由自己搭建，最让我头疼的是异常处理，我看了一些网上的源码，发现他们对异常的处理不是很重视，研究了很久都没有找到很好的解决方案。后来有幸看到一个200W美元的项目部分源码，通过他们对异常处理的解决方案，我终
记录整理-工作问题 braveCS 工作
1）那位同学还是CSV文件默认Excel打开看不到全部结果。以为是没写进去。同学甲说文件应该不分大小。后来log一下原来是有写进去。只是Excel有行数限制。那位同学进步好快啊。 2）今天同学说写文件的时候提示jvm的内存溢出。我马上反应说那就改一下jvm的内存大小。同学说改用分批处理了。果然想问题还是有局限性。改jvm内存大小只能暂时地解决问题，以后要是写更大的文件还是得改内存。想问题要长远啊
org.apache.tools.zip实现文件的压缩和解压，支持中文 bylijinnan apache
刚开始用java.util.Zip，发现不支持中文（网上有修改的方法，但比较麻烦）后改用org.apache.tools.zip org.apache.tools.zip的使用网上有更简单的例子下面的程序根据实际需求，实现了压缩指定目录下指定文件的方法 import java.io.BufferedReader; import java.io.BufferedWrit
读书笔记-4 chengxuyuancsdn 读书笔记
1、JSTL 核心标签库标签 2、避免SQL注入 3、字符串逆转方法 4、字符串比较compareTo 5、字符串替换replace 6、分拆字符串 1、JSTL 核心标签库标签共有13个，学习资料：http://www.cnblogs.com/lihuiyy/archive/2012/02/24/2366806.html 功能上分为4类： (1)表达式控制标签：out
[物理与电子]半导体教材的一个小问题 comsci 问题
各种模拟电子和数字电子教材中都有这个词汇-空穴书中对这个词汇的解释是; 当电子脱离共价键的束缚成为自由电子之后,共价键中就留下一个空位,这个空位叫做空穴我现在回过头翻大学时候的教材,觉得这个
Flashback Database --闪回数据库 daizj oracle 闪回数据库
Flashback 技术是以Undo segment中的内容为基础的，因此受限于UNDO_RETENTON参数。要使用flashback 的特性，必须启用自动撤销管理表空间。在Oracle 10g中， Flash back家族分为以下成员： Flashback Database， Flashback Drop，Flashback Query(分Flashback Query,Flashbac
简单排序:插入排序 dieslrae 插入排序
public void insertSort(int[] array){ int temp; for(int i=1;i<array.length;i++){ temp = array[i]; for(int k=i-1;k>=0;k--)
C语言学习六指针小示例、一维数组名含义，定义一个函数输出数组的内容 dcj3sjt126com c
# include <stdio.h> int main(void) { int * p; //等价于 int *p 也等价于 int* p; int i = 5; char ch = 'A'; //p = 5; //error //p = &ch; //error //p = ch; //error p = &i; //
centos下php redis扩展的安装配置3种方法 dcj3sjt126com redis
方法一 1.下载php redis扩展包代码如下复制代码 #wget http://redis.googlecode.com/files/redis-2.4.4.tar.gz 2 tar -zxvf 解压压缩包，cd /扩展包（进入扩展包然后运行phpize 一下是我环境中phpize的目录，/usr/local/php/bin/phpize (一定要
线程池(Executors) shuizhaosi888 线程池
在java类库中，任务执行的主要抽象不是Thread，而是Executor，将任务的提交过程和执行过程解耦 public interface Executor { void execute(Runnable command); } public class RunMain implements Executor{ @Override pub
openstack 快速安装笔记 haoningabc openstack
前提是要配置好yum源版本icehouse，操作系统redhat6.5 最简化安装，不要cinder和swift 三个节点 172 control节点keystone glance horizon 173 compute节点nova 173 network节点neutron control /etc/sysctl.conf net.ipv4.ip_forward =
从c面向对象的实现理解c++的对象（二） jimmee C++面向对象虚函数
1. 类就可以看作一个struct，类的方法，可以理解为通过函数指针的方式实现的，类对象分配内存时，只分配成员变量的，函数指针并不需要分配额外的内存保存地址。 2. c++中类的构造函数，就是进行内存分配(malloc)，调用构造函数 3. c++中类的析构函数，就时回收内存(free) 4. c++是基于栈和全局数据分配内存的，如果是一个方法内创建的对象，就直接在栈上分配内存了。专门在
如何让那个一个div可以拖动 lingfeng520240 html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml
第10章高级事件（中） onestopweb 事件
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
计算两个经纬度之间的距离 roadrunners 计算纬度 LBS 经度距离
要解决这个问题的时候，到网上查了很多方案，最后计算出来的都与百度计算出来的有出入。下面这个公式计算出来的距离和百度计算出来的距离是一致的。 /** * * @param longitudeA * 经度A点 * @param latitudeA * 纬度A点 * @param longitudeB *
最具争议的10个Java话题 tomcat_oracle java
1、Java8已经到来。什么！？ Java8 支持lambda。哇哦，RIP Scala！　　随着Java8 的发布，出现很多关于新发布的Java8是否有潜力干掉Scala的争论，最终的结论是远远没有那么简单。Java8可能已经在Scala的lambda的包围中突围，但Java并非是函数式编程王位的真正觊觎者。　　2、Java 9 即将到来　　 Oracle早在8月份就发布
zoj 3826 Hierarchical Notation(模拟) 阿尔萨斯 rar
题目链接：zoj 3826 Hierarchical Notation 题目大意：给定一些结构体，结构体有value值和key值，Q次询问，输出每个key值对应的value值。解题思路：思路很简单，写个类词法的递归函数，每次将key值映射成一个hash值，用map映射每个key的value起始终止位置，预处理完了查询就很简单了。这题是最后10分钟出的，因为没有考虑value为{}的情

细说HBase怎么完成一个Get操作 (client side)

你可能感兴趣的:(hbase)