HBase中Lease创建、失效、及常见问题

HBase通过租约来控制每个scanner的操作时间。


1. 租约线程初始化:
HRegionServer的run方法会调用一次preRegistrationInitialization方法,再调用initializeThreads时,会new lease

    this.leases = new Leases((int) conf.getLong(
        HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY,
        HConstants.DEFAULT_HBASE_REGIONSERVER_LEASE_PERIOD),
        this.threadWakeFrequency);


这里默认的过期时间是60s:
  
  public static String HBASE_REGIONSERVER_LEASE_PERIOD_KEY =
    "hbase.regionserver.lease.period";
public static long DEFAULT_HBASE_REGIONSERVER_LEASE_PERIOD = 60000;


默认的lease线程周期性检查时间是10s
  /** Parameter name for how often threads should wake up */
  public static final String THREAD_WAKE_FREQUENCY = "hbase.server.thread.wakefrequency";

  /** Default value for thread wake frequency */
  public static final int DEFAULT_THREAD_WAKE_FREQUENCY = 10 * 1000;


最终在HRegionServer的startServiceThreads启动lease线程。
    this.leases.setName(n + ".leaseChecker");
    this.leases.start();




2. 租约的创建
在openScanner和addRowLock时会创建租约
openScanner时,对于一个新的scanner会creatLease.
  protected long addScanner(RegionScanner s) throws LeaseStillHeldException {
    long scannerId = -1L;
    while (true) {
      scannerId = rand.nextLong();
      if (scannerId == -1) continue;
      String scannerName = String.valueOf(scannerId);
      RegionScanner existing = scanners.putIfAbsent(scannerName, s);
      if (existing == null) {
        this.leases.createLease(scannerName, new ScannerListener(scannerName));
        break;
      }
    } 
    return scannerId;
  }

最终将lease以scannerId加入DelayQueue中,
  public void addLease(final Lease lease) throws LeaseStillHeldException {
    if (this.stopRequested) {
      return;
    }
    lease.setExpirationTime(System.currentTimeMillis() + this.leasePeriod);
    synchronized (leaseQueue) {
      if (leases.containsKey(lease.getLeaseName())) {
        throw new LeaseStillHeldException(lease.getLeaseName());
      }
      leases.put(lease.getLeaseName(), lease);
      leaseQueue.add(lease);
    }
  }


3. 租约的失效
租约线程每10s会检查一次leaseQueue,leaseQueue是一个java.util.concurrent.DelayQueue, 是一个使用优先队列(PriorityQueue)实现的BlockingQueue,优先队列的以指定的时间做为比较的基准值。

  public void run() {
    while (!stopRequested || (stopRequested && leaseQueue.size() > 0) ) {
      Lease lease = null;
      try {
        lease = leaseQueue.poll(leaseCheckFrequency, TimeUnit.MILLISECONDS);
      } catch (InterruptedException e) {
        continue;
      } catch (ConcurrentModificationException e) {
        continue;
      } catch (Throwable e) {
        LOG.fatal("Unexpected exception killed leases thread", e);
        break;
      }
      if (lease == null) {
        continue;
      }
      // A lease expired.  Run the expired code before removing from queue
      // since its presence in queue is used to see if lease exists still.
      if (lease.getListener() == null) {
        LOG.error("lease listener is null for lease " + lease.getLeaseName());
      } else {
        lease.getListener().leaseExpired();
      }
      synchronized (leaseQueue) {
        leases.remove(lease.getLeaseName());
      }
    }
    close();
  }


poll方法会取出到期的lease并执行其Listener的过期方法。
    public void leaseExpired() {
      RegionScanner s = scanners.remove(this.scannerName);
      if (s != null) {
        LOG.info("Scanner " + this.scannerName + " lease expired on region "
            + s.getRegionInfo().getRegionNameAsString());
        try {
          HRegion region = getRegion(s.getRegionInfo().getRegionName());
          if (region != null && region.getCoprocessorHost() != null) {
            region.getCoprocessorHost().preScannerClose(s);
          }

          s.close();
          if (region != null && region.getCoprocessorHost() != null) {
            region.getCoprocessorHost().postScannerClose(s);
          }
        } catch (IOException e) {
          LOG.error("Closing scanner for "
              + s.getRegionInfo().getRegionNameAsString(), e);
        }
      } else {
        LOG.info("Scanner " + this.scannerName + " lease expired");
      }
    }


过期方法中会将此scanner从内存中删除并将scanner关闭。


4. 常见错误
2013-11-06 16:16:38,684 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.regionserver.LeaseException: lease '-2408052186420749395' does not exist
        at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2783)
        at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)


以上常见的错误是因为leaser失效,而client可能没有关闭scanner,使用老的scannerid过来next时,会有一个重新生成lease的过程,过程如下:
1.
      lease = this.leases.removeLease(scannerName);

在next方法中,先执行一次删除lease的操作,看看lease能不能正常删除
  Lease removeLease(final String leaseName) throws LeaseException {
    Lease lease =  null;
    synchronized (leaseQueue) {
      lease = leases.remove(leaseName);
      if (lease == null) {
        throw new LeaseException("lease '" + leaseName + "' does not exist");
      }
      leaseQueue.remove(lease);
    }
    return lease;
  }

如果这个lease是存在的,自然可以正常删除,一量lease已经失效,则会抛LeaseException,
正常情况下,lease被remove之后,为了一个正常的next能继续运行下去,那么在最后会再增加一个lease,leasename还是原来的scannerid
      if (this.scanners.containsKey(scannerName)) {
        if (lease != null) this.leases.addLease(lease);
      }


针对以上错误
1.检查hbase.rpc.timeout(默认60000ms) 是否大于等于hbase.regionserver.lease.period(默认为60000ms), 大于等于才是对的。
2. 检查是否有scanner没有关闭。



你可能感兴趣的:(java,hadoop,hbase)