Master_Reid

HBase source code. MemStore

配置

hbase.regionserver.global.memstore.size: 0.4

一个region servetr中所有memstore的size的最大值, 默认值为堆内存的40%. 在此之前, 更新不会被阻塞, 也不会被flush.

hbase.regionserver.global.memstore.size.lower.limit: 0.95

就是flush的触发线, 默认值是hbase.regionserver.global.memstore.size的95%. 如果把值设置成1.0, 则可能会导致update已经block了, 但是却不刷新.

hbase.hregion.memstore.flush.size: 134217728 (单位字节)

当menstore中的字节数超过数字, 会导致memstore flush. 这个数字的检查会由另外一条线程执行

hbase.hregion.memstore.block.multiplier: 4

如果memstore中的size的大小达到这个参数乘以hbase.hregion.memstore.flush.size那么多个字节, 则阻塞更新.

hbase.hregion.memstore.mslab.enabled: true

允许使用MemStore-Local Allocation Buffer, 这是用来防止堆内存过于碎片化的现象, 特别是在多写入的情景下. 这可以减少频繁调用GC.

hbase.hstore.flusher.count: 2

flush的线程数, 线程数越少, memstore的flush就需要排队; 线程数越多, 会平行执行, 但是回增加HDFS的负载, 并且可能导致更多的compaction

简介

MemStore通俗地来说, 就是HBase暂时存放数据的地方, 物理存放的地方其实就是内存.

每个HStore中只有一个MemStore. 当MemStore的size大于某个阈值, 就会将发生flush, flush会产生StoreFile, 所以一个HStore中可以有很多个StoreFiles.

还顺带一提的是, 每个HStore存储的是某table的某个column family的某几个rows的信息.

都知道HBase是架在Hadoop的HDFS上, 而HDFS不提供快取快读等操作, 只支持顺序读写, 而且一旦写入HDFS后, 只允许添加或删除, 不支持直接修改, 所以效率会比较低下. 而且实际上HDFS是对row key进行了排序存放的

所以这个时候MemStore就是提供了这么一个快写和快读的途径了(一般刚写入的数据, 很有可能很快就需要被读取), 还可以在flush成HFile之前, 对row key进行排序(因为HFile是最后需要写入到HDFS上的, 所以HFile必须符合HDFS的存储结构模型)

什么时候会将数据写入到MemStore上呢?

涉及数据的读写都会用到.

在write path上, 数据在写入log后, 会先写到MemStore上, 当MemStore到达超过阈值, 会flush生成HFile

在read path上, 会先从MemStore那里搜寻是否存在目标数据, 如果不存在, 再到HFile处搜寻.

关于flush, flush影响系统效能, 因为flush的过程会引起写锁, 即独占锁, 所以如果经常flush的话, 效率自然很低;

如果不经常flush的话, 即MemStore的size设置成好大, 一个问题是一旦flush会耗费更长时间, 一个是因为还没写到HFile里面, 一旦region server挂掉, 要replay操作的话, 也会超长时间, 因为log肯定积累了很多.

flush时, memstore会先被take a snapshot, 然后被清理掉. 新的memstore继续支持读写操作, 然后snapshot会被一直备份着, 直到被通知刚才的flush已经成功, 然后这份snapshot也会被清理掉.

会有以下情况触发memstore的flush:

1. 当某个memstore的size达到hbase.hregion.memstore.flush.size中指定的大小, 所有属于那个region的memstores都会被flush.

2. 当整体memstore的使用数字达到hbase.regionserver.global.memstore.upperLimit, 则各个regions上的memstores都会被flush. flush的顺序会按照memstore的使用大小降序排列. flush持续到直到整体memstore的使用大小掉到hbase.regionserver.global.memstore.lowerLimit以下.

3. 当每个region server的WAL的数目达到hbase.regionserver.max.logs. 则各个regions上的memstores都会被flush, 从而减少WAL的数目. flush顺序按时间排序. 那些拥有的最老的memstores先flush. flush直到WAL的数目降到hbase.regionserver.max.logs以下.

这里只有简单的概念介绍, 更多的更详细的写得更好的文章, 推荐如下(一篇是英文, 一篇是中文(其实就是英文那篇的翻译而已)):

HBase MemStore English

HBase MemStore Chinese

那时候看0.98.7 MemStore还是独自一个class, 0.99.7里面, MemStore成了一个interface了. 而且DefaultMemStore是MemStore的实现, 暂时唯一.

目测以后会有很多优化的MemStore版本出现, 拭目以待.

简析

好, 我们直接分析DefaultMemStore的成员和方法:

首先我们需要意思到MemStore其实也是一种存储结构, 所以肯定有存储单位, 而涉及底层的肯定是Cell, 具体就是KeyValue了

所以会有KeyValue的list, 而且, 因为需要排序, 所以这个list必须维护顺序. 还有为了搜寻的快速, 所以实现用了一个即支持排序, 又支持快速搜寻的list,

叫CellSkipListSet, 还用了set, 这个原因是说, 设计者只在乎key的值, value的值一不一样的没差(其实key已经决定了value, 所以只比较key确实就可以)

还有snapshot, snapshot的原理(基本上就是在flush之前, 进行一个快照, 保存flush前状态, 一旦出什么事可以进行恢复):

wiki snapshot

cloudera snapshot

还有snapshotId, 这个是为每次生成的snapshot的一个编号, 也是为了方便清理snapshot而设计的.

其它成员还有, 当前memStore的size, 还有几个allocator(这个是为了保持这个MemStore的堆内存不要过于零散, 好处是flush可以成片flush, 而不是零散的内存)

关于time的几个数据, 因为有些可能数据的timestamp已经很老了, 所以在memstore可以不需要, 直接被overwrite, 当然还有检测数据的时间范围的, 其实就是key的timeStamp.

还有一个内部类, 叫MemStoreScanner, 就是提供Memstore数据读取的, 读数据只能靠它, 是只能!.

构造方法不解释.

还有生成snapshot, 清除snapshot

拿出当前memStore的size.

存储类型操作有:

添加add, 删除delete, 撤回rollback,

更新或插入upsert(这里的插入应该是append或increment, add属于是无中生有),

还支持指定column的value的修改(更upsert不同, upsert是针对cell这个层级, 而这个操作应该是针对比cell更小的byte层级)

获取scanner(读操作)

判断当前scan的目标key值是否存在于memstore, 叫shouldSeek()

但是没有flush的method, 只提供获取当前size的方法, flush的管理轮不到memstore, 这个当谈到再说.

Scanner里面的方法:

一般的peek()获取第一个cell, next()下一个cell, seek()寻找指定cell, reseek()这是seek()了之后的进一步seek(), 一般seek()只执行一次, 而reseek()能执行很多次.

更通俗地解释, seek先获得一个范围的子集, 而reseek()是针对这个子集的搜索

还有一个getSequenceId, 意思是操作的最新值. sequeceId是从0开始线性递增的, 而且static, 所以记录最大的sequeceId, 当flush到HFile后, 我们可以通过sequeceId的值, 判断所需值在哪个HFile里面.

还有scanner的close, 判断scanner里面是否含有指定cell.

还有几个特别搜寻方式, 是搜寻到前一行, 和搜寻到下一行. 其实就指定行搜寻; 还有一个从后往前搜寻.

(code有1k行哦, 有兴趣的话慢用, xd)

from Reid Chan

/**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.hbase.regionserver;

import java.lang.management.ManagementFactory;
import java.lang.management.RuntimeMXBean;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.NavigableSet;
import java.util.SortedSet;
import java.util.concurrent.atomic.AtomicLong;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hbase.classification.InterfaceAudience;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.KeyValueUtil;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.ByteRange;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.util.ClassSize;
import org.apache.hadoop.hbase.util.CollectionBackedScanner;
import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
import org.apache.hadoop.hbase.util.Pair;
import org.apache.hadoop.hbase.util.ReflectionUtils;

/**
 * The MemStore holds in-memory modifications to the Store.  Modifications
 * are {@link Cell}s.  When asked to flush, current memstore is moved
 * to snapshot and is cleared.  We continue to serve edits out of new memstore
 * and backing snapshot until flusher reports in that the flush succeeded. At
 * this point we let the snapshot go.
 *  
 * The MemStore functions should not be called in parallel. Callers should hold
 *  write and read locks. This is done in {@link HStore}.
 *  
 *
 * TODO: Adjust size of the memstore when we remove items because they have
 * been deleted.
 * TODO: With new KVSLS, need to make sure we update HeapSize with difference
 * in KV size.
 */
@InterfaceAudience.Private
public class DefaultMemStore implements MemStore {
  private static final Log LOG = LogFactory.getLog(DefaultMemStore.class);
  static final String USEMSLAB_KEY = "hbase.hregion.memstore.mslab.enabled";
  private static final boolean USEMSLAB_DEFAULT = true;
  static final String MSLAB_CLASS_NAME = "hbase.regionserver.mslab.class";

  private Configuration conf;

  // MemStore.  Use a CellSkipListSet rather than SkipListSet because of the
  // better semantics.  The Map will overwrite if passed a key it already had
  // whereas the Set will not add new Cell if key is same though value might be
  // different.  Value is not important -- just make sure always same
  // reference passed.
  volatile CellSkipListSet cellSet;

  // Snapshot of memstore.  Made for flusher.
  volatile CellSkipListSet snapshot;

  final KeyValue.KVComparator comparator;

  // Used to track own heapSize
  final AtomicLong size;
  private volatile long snapshotSize;

  // Used to track when to flush
  volatile long timeOfOldestEdit = Long.MAX_VALUE;

  TimeRangeTracker timeRangeTracker;
  TimeRangeTracker snapshotTimeRangeTracker;

  volatile MemStoreLAB allocator;
  volatile MemStoreLAB snapshotAllocator;
  volatile long snapshotId;

  /**
   * Default constructor. Used for tests.
   */
  public DefaultMemStore() {
    this(HBaseConfiguration.create(), KeyValue.COMPARATOR);
  }

  /**
   * Constructor.
   * @param c Comparator
   */
  public DefaultMemStore(final Configuration conf,
                  final KeyValue.KVComparator c) {
    this.conf = conf;
    this.comparator = c;
    this.cellSet = new CellSkipListSet(c);
    this.snapshot = new CellSkipListSet(c);
    timeRangeTracker = new TimeRangeTracker();
    snapshotTimeRangeTracker = new TimeRangeTracker();
    this.size = new AtomicLong(DEEP_OVERHEAD);
    this.snapshotSize = 0;
    if (conf.getBoolean(USEMSLAB_KEY, USEMSLAB_DEFAULT)) {
      String className = conf.get(MSLAB_CLASS_NAME, HeapMemStoreLAB.class.getName());
      this.allocator = ReflectionUtils.instantiateWithCustomCtor(className,
          new Class[] { Configuration.class }, new Object[] { conf });
    } else {
      this.allocator = null;
    }
  }

  void dump() {
    for (Cell cell: this.cellSet) {
      LOG.info(cell);
    }
    for (Cell cell: this.snapshot) {
      LOG.info(cell);
    }
  }

  /**
   * Creates a snapshot of the current memstore.
   * Snapshot must be cleared by call to {@link #clearSnapshot(long)}
   */
  @Override
  public MemStoreSnapshot snapshot() {
    // If snapshot currently has entries, then flusher failed or didn't call
    // cleanup.  Log a warning.
    if (!this.snapshot.isEmpty()) {
      LOG.warn("Snapshot called again without clearing previous. " +
          "Doing nothing. Another ongoing flush or did we fail last attempt?");
    } else {
      this.snapshotId = EnvironmentEdgeManager.currentTime();
      this.snapshotSize = keySize();
      if (!this.cellSet.isEmpty()) {
        this.snapshot = this.cellSet;
        this.cellSet = new CellSkipListSet(this.comparator);
        this.snapshotTimeRangeTracker = this.timeRangeTracker;
        this.timeRangeTracker = new TimeRangeTracker();
        // Reset heap to not include any keys
        this.size.set(DEEP_OVERHEAD);
        this.snapshotAllocator = this.allocator;
        // Reset allocator so we get a fresh buffer for the new memstore
        if (allocator != null) {
          String className = conf.get(MSLAB_CLASS_NAME, HeapMemStoreLAB.class.getName());
          this.allocator = ReflectionUtils.instantiateWithCustomCtor(className,
              new Class[] { Configuration.class }, new Object[] { conf });
        }
        timeOfOldestEdit = Long.MAX_VALUE;
      }
    }
    return new MemStoreSnapshot(this.snapshotId, snapshot.size(), this.snapshotSize,
        this.snapshotTimeRangeTracker, new CollectionBackedScanner(snapshot, this.comparator));
  }

  /**
   * The passed snapshot was successfully persisted; it can be let go.
   * @param id Id of the snapshot to clean out.
   * @throws UnexpectedStateException
   * @see #snapshot()
   */
  @Override
  public void clearSnapshot(long id) throws UnexpectedStateException {
    MemStoreLAB tmpAllocator = null;
    if (this.snapshotId != id) {
      throw new UnexpectedStateException("Current snapshot id is " + this.snapshotId + ",passed "
          + id);
    }
    // OK. Passed in snapshot is same as current snapshot. If not-empty,
    // create a new snapshot and let the old one go.
    if (!this.snapshot.isEmpty()) {
      this.snapshot = new CellSkipListSet(this.comparator);
      this.snapshotTimeRangeTracker = new TimeRangeTracker();
    }
    this.snapshotSize = 0;
    this.snapshotId = -1;
    if (this.snapshotAllocator != null) {
      tmpAllocator = this.snapshotAllocator;
      this.snapshotAllocator = null;
    }
    if (tmpAllocator != null) {
      tmpAllocator.close();
    }
  }

  @Override
  public long getFlushableSize() {
    return this.snapshotSize > 0 ? this.snapshotSize : keySize();
  }

  /**
   * Write an update
   * @param cell
   * @return approximate size of the passed KV & newly added KV which maybe different than the
   *         passed-in KV
   */
  @Override
  public Pair add(Cell cell) {
    Cell toAdd = maybeCloneWithAllocator(cell);
    return new Pair(internalAdd(toAdd), toAdd);
  }

  @Override
  public long timeOfOldestEdit() {
    return timeOfOldestEdit;
  }

  private boolean addToCellSet(Cell e) {
    boolean b = this.cellSet.add(e);
    setOldestEditTimeToNow();
    return b;
  }

  private boolean removeFromCellSet(Cell e) {
    boolean b = this.cellSet.remove(e);
    setOldestEditTimeToNow();
    return b;
  }

  void setOldestEditTimeToNow() {
    if (timeOfOldestEdit == Long.MAX_VALUE) {
      timeOfOldestEdit = EnvironmentEdgeManager.currentTime();
    }
  }

  /**
   * Internal version of add() that doesn't clone Cells with the
   * allocator, and doesn't take the lock.
   *
   * Callers should ensure they already have the read lock taken
   */
  private long internalAdd(final Cell toAdd) {
    long s = heapSizeChange(toAdd, addToCellSet(toAdd));
    timeRangeTracker.includeTimestamp(toAdd);
    this.size.addAndGet(s);
    return s;
  }

  private Cell maybeCloneWithAllocator(Cell cell) {
    if (allocator == null) {
      return cell;
    }

    int len = KeyValueUtil.length(cell);
    ByteRange alloc = allocator.allocateBytes(len);
    if (alloc == null) {
      // The allocation was too large, allocator decided
      // not to do anything with it.
      return cell;
    }
    assert alloc.getBytes() != null;
    KeyValueUtil.appendToByteArray(cell, alloc.getBytes(), alloc.getOffset());
    KeyValue newKv = new KeyValue(alloc.getBytes(), alloc.getOffset(), len);
    newKv.setSequenceId(cell.getSequenceId());
    return newKv;
  }

  /**
   * Remove n key from the memstore. Only cells that have the same key and the
   * same memstoreTS are removed.  It is ok to not update timeRangeTracker
   * in this call. It is possible that we can optimize this method by using
   * tailMap/iterator, but since this method is called rarely (only for
   * error recovery), we can leave those optimization for the future.
   * @param cell
   */
  @Override
  public void rollback(Cell cell) {
    // If the key is in the snapshot, delete it. We should not update
    // this.size, because that tracks the size of only the memstore and
    // not the snapshot. The flush of this snapshot to disk has not
    // yet started because Store.flush() waits for all rwcc transactions to
    // commit before starting the flush to disk.
    Cell found = this.snapshot.get(cell);
    if (found != null && found.getSequenceId() == cell.getSequenceId()) {
      this.snapshot.remove(cell);
      long sz = heapSizeChange(cell, true);
      this.snapshotSize -= sz;
    }
    // If the key is in the memstore, delete it. Update this.size.
    found = this.cellSet.get(cell);
    if (found != null && found.getSequenceId() == cell.getSequenceId()) {
      removeFromCellSet(cell);
      long s = heapSizeChange(cell, true);
      this.size.addAndGet(-s);
    }
  }

  /**
   * Write a delete
   * @param deleteCell
   * @return approximate size of the passed key and value.
   */
  @Override
  public long delete(Cell deleteCell) {
    long s = 0;
    Cell toAdd = maybeCloneWithAllocator(deleteCell);
    s += heapSizeChange(toAdd, addToCellSet(toAdd));
    timeRangeTracker.includeTimestamp(toAdd);
    this.size.addAndGet(s);
    return s;
  }

  /**
   * @param cell Find the row that comes after this one.  If null, we return the
   * first.
   * @return Next row or null if none found.
   */
  Cell getNextRow(final Cell cell) {
    return getLowest(getNextRow(cell, this.cellSet), getNextRow(cell, this.snapshot));
  }

  /*
   * @param a
   * @param b
   * @return Return lowest of a or b or null if both a and b are null
   */
  private Cell getLowest(final Cell a, final Cell b) {
    if (a == null) {
      return b;
    }
    if (b == null) {
      return a;
    }
    return comparator.compareRows(a, b) <= 0? a: b;
  }

  /*
   * @param key Find row that follows this one.  If null, return first.
   * @param map Set to look in for a row beyond row.
   * @return Next row or null if none found.  If one found, will be a new
   * KeyValue -- can be destroyed by subsequent calls to this method.
   */
  private Cell getNextRow(final Cell key,
      final NavigableSet set) {
    Cell result = null;
    SortedSet tail = key == null? set: set.tailSet(key);
    // Iterate until we fall into the next row; i.e. move off current row
    for (Cell cell: tail) {
      if (comparator.compareRows(cell, key) <= 0)
        continue;
      // Note: Not suppressing deletes or expired cells.  Needs to be handled
      // by higher up functions.
      result = cell;
      break;
    }
    return result;
  }

  /**
   * @param state column/delete tracking state
   */
  @Override
  public void getRowKeyAtOrBefore(final GetClosestRowBeforeTracker state) {
    getRowKeyAtOrBefore(cellSet, state);
    getRowKeyAtOrBefore(snapshot, state);
  }

  /*
   * @param set
   * @param state Accumulates deletes and candidates.
   */
  private void getRowKeyAtOrBefore(final NavigableSet set,
      final GetClosestRowBeforeTracker state) {
    if (set.isEmpty()) {
      return;
    }
    if (!walkForwardInSingleRow(set, state.getTargetKey(), state)) {
      // Found nothing in row.  Try backing up.
      getRowKeyBefore(set, state);
    }
  }

  /*
   * Walk forward in a row from firstOnRow.  Presumption is that
   * we have been passed the first possible key on a row.  As we walk forward
   * we accumulate deletes until we hit a candidate on the row at which point
   * we return.
   * @param set
   * @param firstOnRow First possible key on this row.
   * @param state
   * @return True if we found a candidate walking this row.
   */
  private boolean walkForwardInSingleRow(final SortedSet set,
      final Cell firstOnRow, final GetClosestRowBeforeTracker state) {
    boolean foundCandidate = false;
    SortedSet tail = set.tailSet(firstOnRow);
    if (tail.isEmpty()) return foundCandidate;
    for (Iterator i = tail.iterator(); i.hasNext();) {
      Cell kv = i.next();
      // Did we go beyond the target row? If so break.
      if (state.isTooFar(kv, firstOnRow)) break;
      if (state.isExpired(kv)) {
        i.remove();
        continue;
      }
      // If we added something, this row is a contender. break.
      if (state.handle(kv)) {
        foundCandidate = true;
        break;
      }
    }
    return foundCandidate;
  }

  /*
   * Walk backwards through the passed set a row at a time until we run out of
   * set or until we get a candidate.
   * @param set
   * @param state
   */
  private void getRowKeyBefore(NavigableSet set,
      final GetClosestRowBeforeTracker state) {
    Cell firstOnRow = state.getTargetKey();
    for (Member p = memberOfPreviousRow(set, state, firstOnRow);
        p != null; p = memberOfPreviousRow(p.set, state, firstOnRow)) {
      // Make sure we don't fall out of our table.
      if (!state.isTargetTable(p.cell)) break;
      // Stop looking if we've exited the better candidate range.
      if (!state.isBetterCandidate(p.cell)) break;
      // Make into firstOnRow
      firstOnRow = new KeyValue(p.cell.getRowArray(), p.cell.getRowOffset(), p.cell.getRowLength(),
          HConstants.LATEST_TIMESTAMP);
      // If we find something, break;
      if (walkForwardInSingleRow(p.set, firstOnRow, state)) break;
    }
  }

  /**
   * Only used by tests. TODO: Remove
   *
   * Given the specs of a column, update it, first by inserting a new record,
   * then removing the old one.  Since there is only 1 KeyValue involved, the memstoreTS
   * will be set to 0, thus ensuring that they instantly appear to anyone. The underlying
   * store will ensure that the insert/delete each are atomic. A scanner/reader will either
   * get the new value, or the old value and all readers will eventually only see the new
   * value after the old was removed.
   *
   * @param row
   * @param family
   * @param qualifier
   * @param newValue
   * @param now
   * @return  Timestamp
   */
  public long updateColumnValue(byte[] row,
                                byte[] family,
                                byte[] qualifier,
                                long newValue,
                                long now) {
    Cell firstCell = KeyValueUtil.createFirstOnRow(row, family, qualifier);
    // Is there a Cell in 'snapshot' with the same TS? If so, upgrade the timestamp a bit.
    SortedSet snSs = snapshot.tailSet(firstCell);
    if (!snSs.isEmpty()) {
      Cell snc = snSs.first();
      // is there a matching Cell in the snapshot?
      if (CellUtil.matchingRow(snc, firstCell) && CellUtil.matchingQualifier(snc, firstCell)) {
        if (snc.getTimestamp() == now) {
          // poop,
          now += 1;
        }
      }
    }

    // logic here: the new ts MUST be at least 'now'. But it could be larger if necessary.
    // But the timestamp should also be max(now, mostRecentTsInMemstore)

    // so we cant add the new Cell w/o knowing what's there already, but we also
    // want to take this chance to delete some cells. So two loops (sad)

    SortedSet ss = cellSet.tailSet(firstCell);
    for (Cell cell : ss) {
      // if this isnt the row we are interested in, then bail:
      if (!CellUtil.matchingColumn(cell, family, qualifier)
          || !CellUtil.matchingRow(cell, firstCell)) {
        break; // rows dont match, bail.
      }

      // if the qualifier matches and it's a put, just RM it out of the cellSet.
      if (cell.getTypeByte() == KeyValue.Type.Put.getCode() &&
          cell.getTimestamp() > now && CellUtil.matchingQualifier(firstCell, cell)) {
        now = cell.getTimestamp();
      }
    }

    // create or update (upsert) a new Cell with
    // 'now' and a 0 memstoreTS == immediately visible
    List cells = new ArrayList(1);
    cells.add(new KeyValue(row, family, qualifier, now, Bytes.toBytes(newValue)));
    return upsert(cells, 1L);
  }

  /**
   * Update or insert the specified KeyValues.
   * 
   * For each KeyValue, insert into MemStore.  This will atomically upsert the
   * value for that row/family/qualifier.  If a KeyValue did already exist,
   * it will then be removed.
   * 

   * Currently the memstoreTS is kept at 0 so as each insert happens, it will
   * be immediately visible.  May want to change this so it is atomic across
   * all KeyValues.
   * 

   * This is called under row lock, so Get operations will still see updates
   * atomically.  Scans will only see each KeyValue update as atomic.
   *
   * @param cells
   * @param readpoint readpoint below which we can safely remove duplicate KVs 
   * @return change in memstore size
   */
  @Override
  public long upsert(Iterable cells, long readpoint) {
    long size = 0;
    for (Cell cell : cells) {
      size += upsert(cell, readpoint);
    }
    return size;
  }

  /**
   * Inserts the specified KeyValue into MemStore and deletes any existing
   * versions of the same row/family/qualifier as the specified KeyValue.
   * 

   * First, the specified KeyValue is inserted into the Memstore.
   * 

   * If there are any existing KeyValues in this MemStore with the same row,
   * family, and qualifier, they are removed.
   * 
   * Callers must hold the read lock.
   *
   * @param cell
   * @return change in size of MemStore
   */
  private long upsert(Cell cell, long readpoint) {
    // Add the Cell to the MemStore
    // Use the internalAdd method here since we (a) already have a lock
    // and (b) cannot safely use the MSLAB here without potentially
    // hitting OOME - see TestMemStore.testUpsertMSLAB for a
    // test that triggers the pathological case if we don't avoid MSLAB
    // here.
    long addedSize = internalAdd(cell);

    // Get the Cells for the row/family/qualifier regardless of timestamp.
    // For this case we want to clean up any other puts
    Cell firstCell = KeyValueUtil.createFirstOnRow(
        cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(),
        cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(),
        cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
    SortedSet ss = cellSet.tailSet(firstCell);
    Iterator it = ss.iterator();
    // versions visible to oldest scanner
    int versionsVisible = 0;
    while ( it.hasNext() ) {
      Cell cur = it.next();

      if (cell == cur) {
        // ignore the one just put in
        continue;
      }
      // check that this is the row and column we are interested in, otherwise bail
      if (CellUtil.matchingRow(cell, cur) && CellUtil.matchingQualifier(cell, cur)) {
        // only remove Puts that concurrent scanners cannot possibly see
        if (cur.getTypeByte() == KeyValue.Type.Put.getCode() &&
            cur.getSequenceId() <= readpoint) {
          if (versionsVisible > 1) {
            // if we get here we have seen at least one version visible to the oldest scanner,
            // which means we can prove that no scanner will see this version

            // false means there was a change, so give us the size.
            long delta = heapSizeChange(cur, true);
            addedSize -= delta;
            this.size.addAndGet(-delta);
            it.remove();
            setOldestEditTimeToNow();
          } else {
            versionsVisible++;
          }
        }
      } else {
        // past the row or column, done
        break;
      }
    }
    return addedSize;
  }

  /*
   * Immutable data structure to hold member found in set and the set it was
   * found in. Include set because it is carrying context.
   */
  private static class Member {
    final Cell cell;
    final NavigableSet set;
    Member(final NavigableSet s, final Cell kv) {
      this.cell = kv;
      this.set = s;
    }
  }

  /*
   * @param set Set to walk back in.  Pass a first in row or we'll return
   * same row (loop).
   * @param state Utility and context.
   * @param firstOnRow First item on the row after the one we want to find a
   * member in.
   * @return Null or member of row previous to firstOnRow
   */
  private Member memberOfPreviousRow(NavigableSet set,
      final GetClosestRowBeforeTracker state, final Cell firstOnRow) {
    NavigableSet head = set.headSet(firstOnRow, false);
    if (head.isEmpty()) return null;
    for (Iterator i = head.descendingIterator(); i.hasNext();) {
      Cell found = i.next();
      if (state.isExpired(found)) {
        i.remove();
        continue;
      }
      return new Member(head, found);
    }
    return null;
  }

  /**
   * @return scanner on memstore and snapshot in this order.
   */
  @Override
  public List getScanners(long readPt) {
    return Collections. singletonList(new MemStoreScanner(readPt));
  }

  /**
   * Check if this memstore may contain the required keys
   * @param scan
   * @return False if the key definitely does not exist in this Memstore
   */
  public boolean shouldSeek(Scan scan, long oldestUnexpiredTS) {
    return (timeRangeTracker.includesTimeRange(scan.getTimeRange()) ||
        snapshotTimeRangeTracker.includesTimeRange(scan.getTimeRange()))
        && (Math.max(timeRangeTracker.getMaximumTimestamp(),
                     snapshotTimeRangeTracker.getMaximumTimestamp()) >=
            oldestUnexpiredTS);
  }

  /*
   * MemStoreScanner implements the KeyValueScanner.
   * It lets the caller scan the contents of a memstore -- both current
   * map and snapshot.
   * This behaves as if it were a real scanner but does not maintain position.
   */
  protected class MemStoreScanner extends NonLazyKeyValueScanner {
    // Next row information for either cellSet or snapshot
    private Cell cellSetNextRow = null;
    private Cell snapshotNextRow = null;

    // last iterated Cells for cellSet and snapshot (to restore iterator state after reseek)
    private Cell cellSetItRow = null;
    private Cell snapshotItRow = null;
    
    // iterator based scanning.
    private Iterator cellSetIt;
    private Iterator snapshotIt;

    // The cellSet and snapshot at the time of creating this scanner
    private CellSkipListSet cellSetAtCreation;
    private CellSkipListSet snapshotAtCreation;

    // the pre-calculated Cell to be returned by peek() or next()
    private Cell theNext;

    // The allocator and snapshot allocator at the time of creating this scanner
    volatile MemStoreLAB allocatorAtCreation;
    volatile MemStoreLAB snapshotAllocatorAtCreation;
    
    // A flag represents whether could stop skipping Cells for MVCC
    // if have encountered the next row. Only used for reversed scan
    private boolean stopSkippingCellsIfNextRow = false;

    private long readPoint;

    /*
    Some notes...

     So memstorescanner is fixed at creation time. this includes pointers/iterators into
    existing kvset/snapshot.  during a snapshot creation, the kvset is null, and the
    snapshot is moved.  since kvset is null there is no point on reseeking on both,
      we can save us the trouble. During the snapshot->hfile transition, the memstore
      scanner is re-created by StoreScanner#updateReaders().  StoreScanner should
      potentially do something smarter by adjusting the existing memstore scanner.

      But there is a greater problem here, that being once a scanner has progressed
      during a snapshot scenario, we currently iterate past the kvset then 'finish' up.
      if a scan lasts a little while, there is a chance for new entries in kvset to
      become available but we will never see them.  This needs to be handled at the
      StoreScanner level with coordination with MemStoreScanner.

      Currently, this problem is only partly managed: during the small amount of time
      when the StoreScanner has not yet created a new MemStoreScanner, we will miss
      the adds to kvset in the MemStoreScanner.
    */

    MemStoreScanner(long readPoint) {
      super();

      this.readPoint = readPoint;
      cellSetAtCreation = cellSet;
      snapshotAtCreation = snapshot;
      if (allocator != null) {
        this.allocatorAtCreation = allocator;
        this.allocatorAtCreation.incScannerCount();
      }
      if (snapshotAllocator != null) {
        this.snapshotAllocatorAtCreation = snapshotAllocator;
        this.snapshotAllocatorAtCreation.incScannerCount();
      }
    }

    /**
     * Lock on 'this' must be held by caller.
     * @param it
     * @return Next Cell
     */
    private Cell getNext(Iterator it) {
      Cell startCell = theNext;
      Cell v = null;
      try {
        while (it.hasNext()) {
          v = it.next();
          if (v.getSequenceId() <= this.readPoint) {
            return v;
          }
          if (stopSkippingCellsIfNextRow && startCell != null
              && comparator.compareRows(v, startCell) > 0) {
            return null;
          }
        }

        return null;
      } finally {
        if (v != null) {
          // in all cases, remember the last Cell iterated to
          if (it == snapshotIt) {
            snapshotItRow = v;
          } else {
            cellSetItRow = v;
          }
        }
      }
    }

    /**
     *  Set the scanner at the seek key.
     *  Must be called only once: there is no thread safety between the scanner
     *   and the memStore.
     * @param key seek value
     * @return false if the key is null or if there is no data
     */
    @Override
    public synchronized boolean seek(Cell key) {
      if (key == null) {
        close();
        return false;
      }
      // kvset and snapshot will never be null.
      // if tailSet can't find anything, SortedSet is empty (not null).
      cellSetIt = cellSetAtCreation.tailSet(key).iterator();
      snapshotIt = snapshotAtCreation.tailSet(key).iterator();
      cellSetItRow = null;
      snapshotItRow = null;

      return seekInSubLists(key);
    }


    /**
     * (Re)initialize the iterators after a seek or a reseek.
     */
    private synchronized boolean seekInSubLists(Cell key){
      cellSetNextRow = getNext(cellSetIt);
      snapshotNextRow = getNext(snapshotIt);

      // Calculate the next value
      theNext = getLowest(cellSetNextRow, snapshotNextRow);

      // has data
      return (theNext != null);
    }


    /**
     * Move forward on the sub-lists set previously by seek.
     * @param key seek value (should be non-null)
     * @return true if there is at least one KV to read, false otherwise
     */
    @Override
    public synchronized boolean reseek(Cell key) {
      /*
      See HBASE-4195 & HBASE-3855 & HBASE-6591 for the background on this implementation.
      This code is executed concurrently with flush and puts, without locks.
      Two points must be known when working on this code:
      1) It's not possible to use the 'kvTail' and 'snapshot'
       variables, as they are modified during a flush.
      2) The ideal implementation for performance would use the sub skip list
       implicitly pointed by the iterators 'kvsetIt' and
       'snapshotIt'. Unfortunately the Java API does not offer a method to
       get it. So we remember the last keys we iterated to and restore
       the reseeked set to at least that point.
       */
      cellSetIt = cellSetAtCreation.tailSet(getHighest(key, cellSetItRow)).iterator();
      snapshotIt = snapshotAtCreation.tailSet(getHighest(key, snapshotItRow)).iterator();

      return seekInSubLists(key);
    }


    @Override
    public synchronized Cell peek() {
      //DebugPrint.println(" MS@" + hashCode() + " peek = " + getLowest());
      return theNext;
    }

    @Override
    public synchronized Cell next() {
      if (theNext == null) {
          return null;
      }

      final Cell ret = theNext;

      // Advance one of the iterators
      if (theNext == cellSetNextRow) {
        cellSetNextRow = getNext(cellSetIt);
      } else {
        snapshotNextRow = getNext(snapshotIt);
      }

      // Calculate the next value
      theNext = getLowest(cellSetNextRow, snapshotNextRow);

      //long readpoint = ReadWriteConsistencyControl.getThreadReadPoint();
      //DebugPrint.println(" MS@" + hashCode() + " next: " + theNext + " next_next: " +
      //    getLowest() + " threadpoint=" + readpoint);
      return ret;
    }

    /*
     * Returns the lower of the two key values, or null if they are both null.
     * This uses comparator.compare() to compare the KeyValue using the memstore
     * comparator.
     */
    private Cell getLowest(Cell first, Cell second) {
      if (first == null && second == null) {
        return null;
      }
      if (first != null && second != null) {
        int compare = comparator.compare(first, second);
        return (compare <= 0 ? first : second);
      }
      return (first != null ? first : second);
    }

    /*
     * Returns the higher of the two cells, or null if they are both null.
     * This uses comparator.compare() to compare the Cell using the memstore
     * comparator.
     */
    private Cell getHighest(Cell first, Cell second) {
      if (first == null && second == null) {
        return null;
      }
      if (first != null && second != null) {
        int compare = comparator.compare(first, second);
        return (compare > 0 ? first : second);
      }
      return (first != null ? first : second);
    }

    public synchronized void close() {
      this.cellSetNextRow = null;
      this.snapshotNextRow = null;

      this.cellSetIt = null;
      this.snapshotIt = null;
      
      if (allocatorAtCreation != null) {
        this.allocatorAtCreation.decScannerCount();
        this.allocatorAtCreation = null;
      }
      if (snapshotAllocatorAtCreation != null) {
        this.snapshotAllocatorAtCreation.decScannerCount();
        this.snapshotAllocatorAtCreation = null;
      }

      this.cellSetItRow = null;
      this.snapshotItRow = null;
    }

    /**
     * MemStoreScanner returns max value as sequence id because it will
     * always have the latest data among all files.
     */
    @Override
    public long getSequenceID() {
      return Long.MAX_VALUE;
    }

    @Override
    public boolean shouldUseScanner(Scan scan, SortedSet columns,
        long oldestUnexpiredTS) {
      return shouldSeek(scan, oldestUnexpiredTS);
    }

    /**
     * Seek scanner to the given key first. If it returns false(means
     * peek()==null) or scanner's peek row is bigger than row of given key, seek
     * the scanner to the previous row of given key
     */
    @Override
    public synchronized boolean backwardSeek(Cell key) {
      seek(key);
      if (peek() == null || comparator.compareRows(peek(), key) > 0) {
        return seekToPreviousRow(key);
      }
      return true;
    }

    /**
     * Separately get the KeyValue before the specified key from kvset and
     * snapshotset, and use the row of higher one as the previous row of
     * specified key, then seek to the first KeyValue of previous row
     */
    @Override
    public synchronized boolean seekToPreviousRow(Cell key) {
      Cell firstKeyOnRow = KeyValueUtil.createFirstOnRow(key.getRowArray(), key.getRowOffset(),
          key.getRowLength());
      SortedSet cellHead = cellSetAtCreation.headSet(firstKeyOnRow);
      Cell cellSetBeforeRow = cellHead.isEmpty() ? null : cellHead.last();
      SortedSet snapshotHead = snapshotAtCreation
          .headSet(firstKeyOnRow);
      Cell snapshotBeforeRow = snapshotHead.isEmpty() ? null : snapshotHead
          .last();
      Cell lastCellBeforeRow = getHighest(cellSetBeforeRow, snapshotBeforeRow);
      if (lastCellBeforeRow == null) {
        theNext = null;
        return false;
      }
      Cell firstKeyOnPreviousRow = KeyValueUtil.createFirstOnRow(lastCellBeforeRow.getRowArray(),
          lastCellBeforeRow.getRowOffset(), lastCellBeforeRow.getRowLength());
      this.stopSkippingCellsIfNextRow = true;
      seek(firstKeyOnPreviousRow);
      this.stopSkippingCellsIfNextRow = false;
      if (peek() == null
          || comparator.compareRows(peek(), firstKeyOnPreviousRow) > 0) {
        return seekToPreviousRow(lastCellBeforeRow);
      }
      return true;
    }

    @Override
    public synchronized boolean seekToLastRow() {
      Cell first = cellSetAtCreation.isEmpty() ? null : cellSetAtCreation
          .last();
      Cell second = snapshotAtCreation.isEmpty() ? null
          : snapshotAtCreation.last();
      Cell higherCell = getHighest(first, second);
      if (higherCell == null) {
        return false;
      }
      Cell firstCellOnLastRow = KeyValueUtil.createFirstOnRow(higherCell.getRowArray(),
          higherCell.getRowOffset(), higherCell.getRowLength());
      if (seek(firstCellOnLastRow)) {
        return true;
      } else {
        return seekToPreviousRow(higherCell);
      }

    }
  }

  public final static long FIXED_OVERHEAD = ClassSize.align(
      ClassSize.OBJECT + (9 * ClassSize.REFERENCE) + (3 * Bytes.SIZEOF_LONG));

  public final static long DEEP_OVERHEAD = ClassSize.align(FIXED_OVERHEAD +
      ClassSize.ATOMIC_LONG + (2 * ClassSize.TIMERANGE_TRACKER) +
      (2 * ClassSize.CELL_SKIPLIST_SET) + (2 * ClassSize.CONCURRENT_SKIPLISTMAP));

  /*
   * Calculate how the MemStore size has changed.  Includes overhead of the
   * backing Map.
   * @param cell
   * @param notpresent True if the cell was NOT present in the set.
   * @return Size
   */
  static long heapSizeChange(final Cell cell, final boolean notpresent) {
    return notpresent ? ClassSize.align(ClassSize.CONCURRENT_SKIPLISTMAP_ENTRY
        + CellUtil.estimatedHeapSizeOf(cell)) : 0;
  }

  private long keySize() {
    return heapSize() - DEEP_OVERHEAD;
  }

  /**
   * Get the entire heap usage for this MemStore not including keys in the
   * snapshot.
   */
  @Override
  public long heapSize() {
    return size.get();
  }

  @Override
  public long size() {
    return heapSize();
  }
 
  /**
   * Code to help figure if our approximation of object heap sizes is close
   * enough.  See hbase-900.  Fills memstores then waits so user can heap
   * dump and bring up resultant hprof in something like jprofiler which
   * allows you get 'deep size' on objects.
   * @param args main args
   */
  public static void main(String [] args) {
    RuntimeMXBean runtime = ManagementFactory.getRuntimeMXBean();
    LOG.info("vmName=" + runtime.getVmName() + ", vmVendor=" +
      runtime.getVmVendor() + ", vmVersion=" + runtime.getVmVersion());
    LOG.info("vmInputArguments=" + runtime.getInputArguments());
    DefaultMemStore memstore1 = new DefaultMemStore();
    // TODO: x32 vs x64
    long size = 0;
    final int count = 10000;
    byte [] fam = Bytes.toBytes("col");
    byte [] qf = Bytes.toBytes("umn");
    byte [] empty = new byte[0];
    for (int i = 0; i < count; i++) {
      // Give each its own ts
      Pair ret = memstore1.add(new KeyValue(Bytes.toBytes(i), fam, qf, i, empty));
      size += ret.getFirst();
    }
    LOG.info("memstore1 estimated size=" + size);
    for (int i = 0; i < count; i++) {
      Pair ret = memstore1.add(new KeyValue(Bytes.toBytes(i), fam, qf, i, empty));
      size += ret.getFirst();
    }
    LOG.info("memstore1 estimated size (2nd loading of same data)=" + size);
    // Make a variably sized memstore.
    DefaultMemStore memstore2 = new DefaultMemStore();
    for (int i = 0; i < count; i++) {
      Pair ret = memstore2.add(new KeyValue(Bytes.toBytes(i), fam, qf, i,
        new byte[i]));
      size += ret.getFirst();
    }
    LOG.info("memstore2 estimated size=" + size);
    final int seconds = 30;
    LOG.info("Waiting " + seconds + " seconds while heap dump is taken");
    for (int i = 0; i < seconds; i++) {
      // Thread.sleep(1000);
    }
    LOG.info("Exiting.");
  }

}

你可能感兴趣的:(HBase)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
hbase介绍 CrazyL- 云计算+大数据 hbase
hbase是一个分布式的、多版本的、面向列的开源数据库hbase利用hadoophdfs作为其文件存储系统，提供高可靠性、高性能、列存储、可伸缩、实时读写、适用于非结构化数据存储的数据库系统hbase利用hadoopmapreduce来处理hbase、中的海量数据hbase利用zookeeper作为分布式系统服务特点：数据量大：一个表可以有上亿行，上百万列（列多时，插入变慢）面向列：面向列（族）的
Apache HBase基础（基本概述，物理架构，逻辑架构，数据管理，架构特点，HBase Shell） May--J--Oldhu HBase HBase shell hbase物理架构 hbase逻辑架构 hbase
NoSQL综述及ApacheHBase基础一.HBase1.HBase概述2.HBase发展历史3.HBase应用场景3.1增量数据-时间序列数据3.2信息交换-消息传递3.3内容服务-Web后端应用程序3.4HBase应用场景示例4.ApacheHBase生态圈5.HBase物理架构5.1HMaster5.2RegionServer5.3Region和Table6.HBase逻辑架构-Row7.
HBase（一）——HBase介绍 weixin_30595035 大数据数据库数据结构与算法
HBase介绍1、关系型数据库与非关系型数据库（1）关系型数据库关系型数据库最典型的数据机构是表，由二维表及其之间的联系所组成的一个数据组织优点：1、易于维护：都是使用表结构，格式一致2、使用方便：SQL语言通用，可用于复杂查询3、复杂操作：支持SQL，可用于一个表以及多个表之间非常复杂的查询缺点：1、读写性能比较差，尤其是海量数据的高效率读写2、固定的表结构，灵活度稍欠3、高并发读写需求，传统关
HBase介绍 mingyu1016 数据库
概述HBase是一个分布式的、面向列的开源数据库,源于google的一篇论文《bigtable：一个结构化数据的分布式存储系统》。HBase是GoogleBigtable的开源实现，它利用HadoopHDFS作为其文件存储系统，利用HadoopMapReduce来处理HBase中的海量数据，利用Zookeeper作为协同服务。HBase的表结构HBase以表的形式存储数据。表有行和列组成。列划分为
Hbase - 迁移数据[导出,导入] kikiki5
>有没有这样一样情况，把一个集群中的某个表导到另一个群集中，或者hbase的表结构发生了更改，但是数据还要，比如预分区没做，导致某台RegionServer很吃紧，Hbase的导出导出都可以很快的完成这些操作。![](https://upload-images.jianshu.io/upload_images/9028759-4fb9aa8ca3777969.png?imageMogr2/auto
通过DBeaver连接Phoenix操作hbase 不想做咸鱼的王富贵
通过DBeaver连接Phoenix操作hbase前言本文介绍常用一种通用数据库工具Dbeaver，DBeaver可通过JDBC连接到数据库，可以支持几乎所有的数据库产品，包括：MySQL、PostgreSQL、MariaDB、SQLite、Oracle、Db2、SQLServer、Sybase、MSAccess、Teradata、Firebird、Derby等等。商业版本更是可以支持各种NoSQ
Hbase - kerberos认证异常 kikiki2
之前怎么认证都认证不上，问题找了好了，发现它的异常跟实际操作根本就对不上，死马当活马医，当时也是瞎改才好的，给大家伙记录记录。KrbException:ServernotfoundinKerberosdatabase(7)-LOOKING_UP_SERVER>>>KdcAccessibility:removestorm1.starsriver.cnatsun.security.krb5.KrbTg
kvm 虚拟机命令行虚拟机操作、制作快照和恢复快照以及工作常用总结西京刀客云原生(Cloud Native)云计算虚拟化 Linux C/C++服务器 linux kvm
文章目录kvm虚拟机命令行虚拟机操作、制作快照和恢复快照一、kvm虚拟机命令行虚拟机操作(创建和删除)查看虚拟机virt-install创建一个虚拟机关闭虚拟机重启虚拟机销毁虚拟机二、kvm制作快照和恢复快照**创建快照**工作常见问题创建快照报错：：internalsnapshotsofaVMwithpflashbasedfirmwarearenotsupported检查虚拟机是否包含pflas
hadoop 0.22.0 部署笔记 weixin_33701564 大数据 java 运维
为什么80%的码农都做不了架构师？>>>因为需要使用hbase，所以开始对hbase进行学习。hbase是部署在hadoop平台上的NOSql数据库，因此在部署hbase之前需要先部署hadoop。环境：redhat5、hadoop-0.22.0.tar.gz、jdk-6u13-linux-i586.zipip192.168.1.128hostname：localhost.localdomain（
实时数仓之实时数仓架构(Hudi)(1)，2024年最新熬夜整理华为最新大数据开发笔试题 2401_84181221 程序员架构大数据
+Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；+Doris：OLAP引擎，同步数仓结果模型，对外提供数据服务支持；+Hbase：用来存储维表信息，维表数据来源一部分有Flink加工实时写入，另一部分是从Spark任务生产，其主要作用用来支持FlinkETL处理过程中的LookupJoin功能。这里选用Hbase原因主要因为Table的HbaseC
HBase 源码阅读（一） Such Devotion hbase 数据库大数据
1.HMastermain方法在上文中MacosM1IDEA本地调试HBase2.2.2，我们使用HMaster的主函数使用"start"作为入参，启动了HMaster进程这里我们再深入了解下HMaster的运行机理publicstaticvoidmain(String[]args){LOG.info("STARTINGservice"+HMaster.class.getSimpleName())
HBase 源码阅读（四）HBase 关于LSM Tree的实现- MemStore Such Devotion hbase lsm-tree 数据库
4.MemStore接口Memstore的函数不能并行的被调用。调用者需要持有读写锁，这个的实现在HStore中我们放弃对MemStore中的诸多函数进行查看直接看MemStore的实现类AbstractMemStoreCompactingMemStoreDefaultMemStore4.1三个实现类的使用场景1.AbstractMemStore角色:基础抽象类作用:AbstractMemStor
大数据（Hbase简单示例） BL小二 hbase 大数据 hadoop
importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.hbase.HBaseConfiguration;importorg.apache.hadoop.hbase.TableName;importorg.apache.hadoop.hbase.client.*;importorg.apache.hadoop.hbase
Hbase的简单使用示例傲雪凌霜，松柏长青后端大数据 hbase 数据库大数据
HBase是基于HadoopHDFS构建的分布式、列式存储的NoSQL数据库，适用于存储和检索超大规模的非结构化数据。它支持随机读写，并且能够处理PB级数据。HBase通常用于实时数据存取场景，与Hadoop生态紧密集成。使用HBase的Java示例前置条件HBase集群：确保HBase集群已经安装并启动。如果没有，你可以通过本地伪分布模式或Docker来运行HBase。Hadoop配置：HBas
快手HBase在千亿级用户特征数据分析中的应用与实践王知无
声明：本文的原文是来自Hbase技术社区的一个PPT分享，个人做了整理和提炼。大家注意哈，这种会议PPT类的东西能学习到的更多的是技术方案和他人在实践过程中的经验。希望对大家有帮助。背景快手每天产生数百亿用户特征数据，分析师需要在跨30-90天的数千亿特征数据中，任意选择多维度组合(如:城市=北京&性别=男)，秒级分析用户行为。针对这一需求,快手基于HBase自主研发了支持bitmap转化、存储、
ClickHouse与其他数据库的对比九州Pro ClickHouse 数据库 clickhouse 数据仓库大数据 sql
目录1与传统关系型数据库的对比1.1性能差异1.2数据模型差异1.3适用场景差异2与其他列式存储数据库的对比2.1ApacheCassandra2.2HBase3与分布式数据库的对比3.1GoogleBigQuery3.2AmazonRedshift3.3Snowflake4ClickHouse的缺点5ClickHouse的其他优点1与传统关系型数据库的对比1.1性能差异ClickHouse是一种
Hbase、hive以及ClickHouse的介绍和区别？ damokelisijian866 hbase hive clickhouse
一、Hbase介绍：HBase是一个分布式的、面向列的开源数据库，由ApacheSoftwareFoundation开发，是Hadoop生态系统中的一个重要组件。HBase的设计灵感来源于Google的Bigtable论文，它通过提供类似于Bigtable的能力，在Hadoop之上构建了一个高可靠性、高性能、面向列、可伸缩的分布式存储系统。HBase主要用于存储大量结构化数据，并支持随机读写访问，
Hive和Hbase的区别傲雪凌霜，松柏长青大数据后端 hive hbase hadoop
Hive和HBase都是Hadoop生态系统中的重要组件，它们都能处理大规模数据，但各自有不同的适用场景和设计理念。以下是两者的主要区别：1.数据模型Hive：Hive类似于传统的关系型数据库(RDBMS)，以表格形式存储数据。它使用SQL-like语言HiveQL来查询和处理数据，数据通常是结构化或半结构化的。HBase：HBase是一个NoSQL数据库，基于Google的BigTable模型。
HBase 傲雪凌霜，松柏长青大数据后端 hbase 数据库大数据
ApacheHBase是一个基于Hadoop分布式文件系统（HDFS）构建的分布式、面向列的NoSQL数据库，主要用于处理大规模、稀疏的表结构数据。HBase的设计灵感来自Google的Bigtable，能够在海量数据中提供快速的随机读写操作，适合需要低延迟和高吞吐量的应用场景。HBase核心概念表（Table）：HBase的数据存储在表中，与传统的关系型数据库不同，HBase的表是面向列族（Co
大数据面试题：说下为什么要使用Hive？Hive的优缺点？Hive的作用是什么？蓦然_ 大数据面试题 hive 大数据开发面试题大数据面试
1、为什么要使用Hive？Hive是Hadoop生态系统中比不可少的一个工具，它提供了一种SQL(结构化查询语言)方言，可以查询存储在Hadoop分布式文件系统（HDFS）中的数据或其他和Hadoop集成的文件系统，如MapR-FS、Amazon的S3和像HBase（Hadoop数据仓库）和Cassandra这样的数据库中的数据。大多数数据仓库应用程序都是使用关系数据库进行实现的，并使用SQL作为
Hadoop组件静听山水 Hadoop hadoop
这张图片展示了Hadoop生态系统的一些主要组件。Hadoop是一个开源的大数据处理框架，由Apache基金会维护。以下是每个组件的简短介绍：HBase：一个分布式、面向列的NoSQL数据库，基于GoogleBigTable的设计理念构建。HBase提供了实时读写访问大量结构化和半结构化数据的能力，非常适合大规模数据存储。Pig：一种高级数据流语言和执行引擎，用于编写MapReduce任务。Pig
Hbase BulkLoad用法 kikiki2
要导入大量数据，Hbase的BulkLoad是必不可少的，在导入历史数据的时候，我们一般会选择使用BulkLoad方式，我们还可以借助Spark的计算能力将数据快速地导入。使用方法导入依赖包compilegroup:'org.apache.spark',name:'spark-sql_2.11',version:'2.3.1.3.0.0.0-1634'compilegroup:'org.apach
EMR组件部署指南 ivwdcwso 运维 EMR 大数据开源运维
EMR(ElasticMapReduce)是一个大数据处理和分析平台,包含了多个开源组件。本文将详细介绍如何部署EMR的主要组件,包括:JDK1.8ElasticsearchKafkaFlinkZookeeperHBaseHadoopPhoenixScalaSparkHive准备工作所有操作都在/data目录下进行。首先安装JDK1.8:yuminstalljava-1.8.0-openjdk部署
Sublime text3+python3配置及插件安装 raysonfang
作者：方雷个人博客：http://blog.chargingbunk.cn/微信公众号：rayson_666(Rayson开发分享)个人专研技术方向：微服务方向：springboot,springCloud,Dubbo分布式/高并发：分布式锁，消息队列RabbitMQ大数据处理：Hadoop,spark,HBase等python方向：pythonweb开发一，前言在网上搜索了一些Python开发的
Spring Data：JPA与Querydsl 光图强 java
JPAJPA是java的一个规范，用于在java对象和数据库之间保存数据，充当面向对象领域模型和数据库之间的桥梁。它使用Hibernate、TopLink、IBatis等ORM框架实现持久性规范。SpringDataSpringData是Spring的一个子项目，用于简化数据库访问，支持NoSql数据和关系数据库。支持的NoSql数据库包括：Mongodb、redis、Hbase、Neo4j。Sp
HBase 源码阅读（二） Such Devotion hbase 数据库大数据
衔接在上一篇文章中，HMasterCommandLine类中在startMaster();方法中//这里除了启动HMaster之外，还启动一个HRegionServerLocalHBaseClustercluster=newLocalHBaseCluster(conf,mastersCount,regionServersCount,LocalHMaster.class,HRegionServer.
大数据技术之HBase 与 Hive 集成(7) 大数据深度洞察 Hbase 大数据 hbase hive
目录使用场景HBase与Hive集成使用1）案例一2）案例二使用场景如果大量的数据已经存放在HBase上面，并且需要对已经存在的数据进行数据分析处理，那么Phoenix并不适合做特别复杂的SQL处理。此时，可以使用Hive映射HBase的表格，之后通过编写HQL进行分析处理。HBase与Hive集成使用Hive安装https://blog.csdn.net/qq_45115959/article/
【HBase之轨迹】（1）使用 Docker 搭建 HBase 集群寒冰小澈IceClean 【大数据之轨迹】【Docker之轨迹】笔记 hbase docker hadoop
——目录——0.前置准备1.下载安装2.配置（重）3.启动与关闭4.搭建高可用HBase前言（贫穷使我见多识广）前边经历了Hadoop，Zookeeper，Kafka，他们的集群，全都是使用Docker搭建的一开始的我认为，把容器看成是一台台独立的服务器就好啦也确实是这样，但端口映射问题，让我一路以来磕碰了太多太多，直到现在的HBase，更是将Docker集群所附带的挑战性，放大到了极致（目前是如
rust的指针作为函数返回值是直接传递，还是先销毁后创建？ wudixiaotie 返回值
这是我自己想到的问题，结果去知呼提问，还没等别人回答，我自己就想到方法实验了。。 fn main() { let mut a = 34; println!("a's addr:{:p}", &a); let p = &mut a; println!("p's addr:{:p}", &a
java编程思想 -- 数据的初始化百合不是茶 java 数据的初始化
1.使用构造器确保数据初始化 /* *在ReckInitDemo类中创建Reck的对象 */ public class ReckInitDemo { public static void main(String[] args) { //创建Reck对象 new Reck(); } }
[航天与宇宙]为什么发射和回收航天器有档期 comsci
地球的大气层中有一个时空屏蔽层,这个层次会不定时的出现,如果该时空屏蔽层出现,那么将导致外层空间进入的任何物体被摧毁,而从地面发射到太空的飞船也将被摧毁... 所以,航天发射和飞船回收都需要等待这个时空屏蔽层消失之后,再进行 &
linux下批量替换文件内容商人shang linux 替换
1、网络上现成的资料　　格式: sed -i "s/查找字段/替换字段/g" `grep 查找字段 -rl 路径` 　　linux sed 批量替换多个文件中的字符串　　sed -i "s/oldstring/newstring/g" `grep oldstring -rl yourdir` 　　例如：替换/home下所有文件中的www.admi
网页在线天气预报 oloz 天气预报
网页在线调用天气预报 <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transit
SpringMVC和Struts2比较杨白白 springMVC
1. 入口 spring mvc的入口是servlet，而struts2是filter（这里要指出，filter和servlet是不同的。以前认为filter是servlet的一种特殊），这样就导致了二者的机制不同，这里就牵涉到servlet和filter的区别了。参见：http://blog.csdn.net/zs15932616453/article/details/8832343 2
refuse copy, lazy girl! 小桔子 copy
妹妹坐船头啊啊啊啊！都打算一点点琢磨呢。文字编辑也写了基本功能了。。今天查资料，结果查到了人家写得完完整整的。我清楚的认识到： 1.那是我自己觉得写不出的高度 2.如果直接拿来用，很快就能解决问题 3.然后就是抄咩~~ 4.肿么可以这样子，都不想写了今儿个，留着作参考吧！拒绝大抄特抄，慢慢一点点写！
apache与php整合 aichenglong php apache web
一 apache web服务器 1 apeche web服务器的安装 1)下载Apache web服务器 2)配置域名(如果需要使用要在DNS上注册) 3)测试安装访问http://localhost/验证是否安装成功 2 apache管理 1)service.msc进行图形化管理 2)命令管理，配
Maven常用内置变量 AILIKES maven
Built-in properties ${basedir} represents the directory containing pom.xml ${version} equivalent to ${project.version} (deprecated: ${pom.version}) Pom/Project properties Al
java的类和对象百合不是茶 JAVA面向对象类对象
java中的类： java是面向对象的语言，解决问题的核心就是将问题看成是一个类，使用类来解决 java使用 class 类名来创建类，在Java中类名要求和构造方法，Java的文件名是一样的创建一个A类： class A{ } java中的类：将某两个事物有联系的属性包装在一个类中，再通
JS控制页面输入框为只读 bijian1013 JavaScript
在WEB应用开发当中，增、删除、改、查功能必不可少，为了减少以后维护的工作量，我们一般都只做一份页面，通过传入的参数控制其是新增、修改或者查看。而修改时需将待修改的信息从后台取到并显示出来，实际上就是查看的过程，唯一的区别是修改时，页面上所有的信息能修改，而查看页面上的信息不能修改。因此完全可以将其合并，但通过前端JS将查看页面的所有信息控制为只读，在信息量非常大时，就比较麻烦。
AngularJS与服务器交互 bijian1013 JavaScript AngularJS $http
对于AJAX应用（使用XMLHttpRequests）来说，向服务器发起请求的传统方式是：获取一个XMLHttpRequest对象的引用、发起请求、读取响应、检查状态码，最后处理服务端的响应。整个过程示例如下： var xmlhttp = new XMLHttpRequest(); xmlhttp.onreadystatechange
[Maven学习笔记八]Maven常用插件应用 bit1129 maven
常用插件及其用法位于：http://maven.apache.org/plugins/ 1. Jetty server plugin 2. Dependency copy plugin 3. Surefire Test plugin 4. Uber jar plugin 1. Jetty Pl
【Hive六】Hive用户自定义函数(UDF) bit1129 自定义函数
1. 什么是Hive UDF Hive是基于Hadoop中的MapReduce，提供HQL查询的数据仓库。Hive是一个很开放的系统，很多内容都支持用户定制，包括：文件格式：Text File，Sequence File 内存中的数据格式： Java Integer/String, Hadoop IntWritable/Text 用户提供的 map/reduce 脚本：不管什么
杀掉nginx进程后丢失nginx.pid，如何重新启动nginx ronin47 nginx 重启 pid丢失
nginx进程被意外关闭，使用nginx -s reload重启时报如下错误：nginx: [error] open() “/var/run/nginx.pid” failed (2: No such file or directory)这是因为nginx进程被杀死后pid丢失了，下一次再开启nginx -s reload时无法启动解决办法：nginx -s reload 只是用来告诉运行中的ng
UI设计中我们为什么需要设计动效 brotherlamp UI ui教程 ui视频 ui资料 ui自学
随着国际大品牌苹果和谷歌的引领，最近越来越多的国内公司开始关注动效设计了，越来越多的团队已经意识到动效在产品用户体验中的重要性了，更多的UI设计师们也开始投身动效设计领域。但是说到底，我们到底为什么需要动效设计？或者说我们到底需要什么样的动效？做动效设计也有段时间了，于是尝试用一些案例，从产品本身出发来说说我所思考的动效设计。一、加强体验舒适度嗯，就是让用户更加爽更加爽的用你的产品。
Spring中JdbcDaoSupport的DataSource注入问题 bylijinnan java spring
参考以下两篇文章： http://www.mkyong.com/spring/spring-jdbctemplate-jdbcdaosupport-examples/ http://stackoverflow.com/questions/4762229/spring-ldap-invoking-setter-methods-in-beans-configuration Sprin
数据库连接池的工作原理 chicony 数据库连接池
随着信息技术的高速发展与广泛应用，数据库技术在信息技术领域中的位置越来越重要，尤其是网络应用和电子商务的迅速发展，都需要数据库技术支持动态Web站点的运行，而传统的开发模式是：首先在主程序（如Servlet、Beans）中建立数据库连接；然后进行SQL操作，对数据库中的对象进行查询、修改和删除等操作；最后断开数据库连接。使用这种开发模式，对
java 关键字 CrazyMizzz java
关键字是事先定义的，有特别意义的标识符，有时又叫保留字。对于保留字，用户只能按照系统规定的方式使用，不能自行定义。 Java中的关键字按功能主要可以分为以下几类：（1）访问修饰符 public,private,protected p
Hive中的排序语法 daizj 排序 hive order by DISTRIBUTE BY sort by
Hive中的排序语法 2014.06.22 ORDER BY hive中的ORDER BY语句和关系数据库中的sql语法相似。他会对查询结果做全局排序，这意味着所有的数据会传送到一个Reduce任务上，这样会导致在大数量的情况下，花费大量时间。与数据库中 ORDER BY 的区别在于在hive.mapred.mode = strict模式下，必须指定 limit 否则执行会报错。
单态设计模式 dcj3sjt126com 设计模式
单例模式（Singleton）用于为一个类生成一个唯一的对象。最常用的地方是数据库连接。使用单例模式生成一个对象后，该对象可以被其它众多对象所使用。 <?phpclass Example{ // 保存类实例在此属性中 private static&
svn locked dcj3sjt126com Lock
post-commit hook failed (exit code 1) with output: svn: E155004: Working copy 'D:\xx\xxx' locked svn: E200031: sqlite: attempt to write a readonly database svn: E200031: sqlite: attempt to write a
ARM寄存器学习 e200702084 数据结构 C++c C#F#
无论是学习哪一种处理器，首先需要明确的就是这种处理器的寄存器以及工作模式。 ARM有37个寄存器，其中31个通用寄存器，6个状态寄存器。 1、不分组寄存器（R0-R7）不分组也就是说说，在所有的处理器模式下指的都时同一物理寄存器。在异常中断造成处理器模式切换时，由于不同的处理器模式使用一个名字相同的物理寄存器，就是
常用编码资料 gengzg 编码
List<UserInfo> list=GetUserS.GetUserList(11); String json=JSON.toJSONString(list); HashMap<Object,Object> hs=new HashMap<Object, Object>(); for(int i=0;i<10;i++) {
进程 vs. 线程 hongtoushizi 线程 linux 进程
我们介绍了多进程和多线程，这是实现多任务最常用的两种方式。现在，我们来讨论一下这两种方式的优缺点。首先，要实现多任务，通常我们会设计Master-Worker模式，Master负责分配任务，Worker负责执行任务，因此，多任务环境下，通常是一个Master，多个Worker。如果用多进程实现Master-Worker，主进程就是Master，其他进程就是Worker。如果用多线程实现
Linux定时Job：crontab -e 与 /etc/crontab 的区别 Josh_Persistence linux crontab
一、linux中的crotab中的指定的时间只有5个部分：* * * * * 分别表示：分钟，小时，日，月，星期，具体说来：第一段代表分钟 0—59 第二段代表小时 0—23 第三段代表日期 1—31 第四段代表月份 1—12 第五段代表星期几，0代表星期日 0—6 如： */1 * * * * 每分钟执行一次。 *
KMP算法详解 hm4123660 数据结构 C++算法字符串 KMP
字符串模式匹配我们相信大家都有遇过，然而我们也习惯用简单匹配法（即Brute-Force算法)，其基本思路就是一个个逐一对比下去，这也是我们大家熟知的方法，然而这种算法的效率并不高，但利于理解。假设主串s="ababcabcacbab",模式串为t="
枚举类型的单例模式 zhb8015 单例模式
E.编写一个包含单个元素的枚举类型[极推荐]。代码如下： public enum MaYun {himself; //定义一个枚举的元素，就代表MaYun的一个实例private String anotherField;MaYun() {//MaYun诞生要做的事情//这个方法也可以去掉。将构造时候需要做的事情放在instance赋值的时候：/** himself = MaYun() {*
Kafka+Storm+HDFS ssydxa219 storm
cd /myhome/usr/stormbin/storm nimbus &bin/storm supervisor &bin/storm ui &Kafka+Storm+HDFS整合实践kafka_2.9.2-0.8.1.1.tgzapache-storm-0.9.2-incubating.tar.gzKafka安装配置我们使用3台机器搭建Kafk
Java获取本地服务器的IP 中华好儿孙 java Web 获取服务器ip地址
System.out.println("getRequestURL:"+request.getRequestURL()); System.out.println("getLocalAddr:"+request.getLocalAddr()); System.out.println("getLocalPort:&quo