blackproof

HBase memflush源码分析

源码为0.98.1

HRegionServer中起线程MemStoreFlusher

private void initializeThreads() throws IOException {
    // Cache flushing thread.
    this.cacheFlusher = new MemStoreFlusher(conf, this);

    // Compaction thread
    this.compactSplitThread = new CompactSplitThread(this);

   .......

  private void startServiceThreads() throws IOException {
    String n = Thread.currentThread().getName();
......
    this.cacheFlusher.start(uncaughtExceptionHandler);

    Threads.setDaemonThreadRunning(this.compactionChecker.getThread(), n +
      ".compactionChecker", uncaughtExceptionHandler);

.....

 /*
   * Run init. Sets up hlog and starts up all server threads.
   *
   * @param c Extra configuration.
   */
  protected void handleReportForDutyResponse(final RegionServerStartupResponse c)
  throws IOException {
....

      startServiceThreads();
.....

  public void run() {
    try {
      // Do pre-registration initializations; zookeeper, lease threads, etc.
      preRegistrationInitialization();
    } catch (Throwable e) {
      abort("Fatal exception during initialization", e);
    }

    try {
      // Try and register with the Master; tell it we are here.  Break if
      // server is stopped or the clusterup flag is down or hdfs went wacky.
      while (keepLooping()) {
        RegionServerStartupResponse w = reportForDuty();
        if (w == null) {
          LOG.warn("reportForDuty failed; sleeping and then retrying.");
          this.sleeper.sleep();
        } else {
          handleReportForDutyResponse(w);//启动所有hregionserver线程服务
          break;
        }
      }
....

主要的类，方法：memStoreFlusher的flushRegion

 private boolean flushRegion(final HRegion region, final boolean emergencyFlush) {
    synchronized (this.regionsInQueue) {
      FlushRegionEntry fqe = this.regionsInQueue.remove(region);
      if (fqe != null && emergencyFlush) {
        // Need to remove from region from delay queue.  When NOT an
        // emergencyFlush, then item was removed via a flushQueue.poll.
        flushQueue.remove(fqe);
     }
    }
    lock.readLock().lock();
    try {
      boolean shouldCompact = region.flushcache();
      // We just want to check the size
      boolean shouldSplit = region.checkSplit() != null;
      if (shouldSplit) {
        this.server.compactSplitThread.requestSplit(region);
      } else if (shouldCompact) {
        server.compactSplitThread.requestSystemCompaction(
            region, Thread.currentThread().getName());
      }
......

从flushQueue中取出FlushRegionEntry进行flush

获取读锁

调用HRegion进行flush，并返回是否需要compact
调用HRegion查看是否需要split
if(split) spliting elif(compact) compacting

以下是具体操作：

--------------------------------------------------------------------------------------------------------------------

1.HRegion

 protected boolean internalFlushcache(
      final HLog wal, final long myseqid, MonitoredTask status)
  throws IOException {
    if (this.rsServices != null && this.rsServices.isAborted()) {
      // Don't flush when server aborting, it's unsafe
      throw new IOException("Aborting flush because server is abortted...");
    }
    final long startTime = EnvironmentEdgeManager.currentTimeMillis();
    // Clear flush flag.
    // If nothing to flush, return and avoid logging start/stop flush.
    if (this.memstoreSize.get() <= 0) {
      if(LOG.isDebugEnabled()) {
        LOG.debug("Empty memstore size for the current region "+this);
      }
      return false;
    }
    if (LOG.isDebugEnabled()) {
      LOG.debug("Started memstore flush for " + this +
        ", current region memstore size " +
        StringUtils.humanReadableInt(this.memstoreSize.get()) +
        ((wal != null)? "": "; wal is null, using passed sequenceid=" + myseqid));
    }

    // Stop updates while we snapshot the memstore of all stores. We only have
    // to do this for a moment.  Its quick.  The subsequent sequence id that
    // goes into the HLog after we've flushed all these snapshots also goes
    // into the info file that sits beside the flushed files.
    // We also set the memstore size to zero here before we allow updates
    // again so its value will represent the size of the updates received
    // during the flush
    MultiVersionConsistencyControl.WriteEntry w = null;

    // We have to take a write lock during snapshot, or else a write could
    // end up in both snapshot and memstore (makes it difficult to do atomic
    // rows then)
    status.setStatus("Obtaining lock to block concurrent updates");
    // block waiting for the lock for internal flush
    this.updatesLock.writeLock().lock();
    long totalFlushableSize = 0;
    status.setStatus("Preparing to flush by snapshotting stores");
    List<StoreFlushContext> storeFlushCtxs = new ArrayList<StoreFlushContext>(stores.size());
    long flushSeqId = -1L;
    try {
      // Record the mvcc for all transactions in progress.
      w = mvcc.beginMemstoreInsert();
      mvcc.advanceMemstore(w);
      // check if it is not closing.
      if (wal != null) {
        if (!wal.startCacheFlush(this.getRegionInfo().getEncodedNameAsBytes())) {
          status.setStatus("Flush will not be started for ["
              + this.getRegionInfo().getEncodedName() + "] - because the WAL is closing.");
          return false;
        }
        flushSeqId = this.sequenceId.incrementAndGet();
      } else {
        // use the provided sequence Id as WAL is not being used for this flush.
        flushSeqId = myseqid;
      }

      for (Store s : stores.values()) {
        totalFlushableSize += s.getFlushableSize();
        storeFlushCtxs.add(s.createFlushContext(flushSeqId));
      }

      // prepare flush (take a snapshot)
      for (StoreFlushContext flush : storeFlushCtxs) {
//步骤1   @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
        flush.prepare(); 
      }
    } finally {
      this.updatesLock.writeLock().unlock();
    }
    String s = "Finished memstore snapshotting " + this +
      ", syncing WAL and waiting on mvcc, flushsize=" + totalFlushableSize;
    status.setStatus(s);
    if (LOG.isTraceEnabled()) LOG.trace(s);

    // sync unflushed WAL changes when deferred log sync is enabled
    // see HBASE-8208 for details
    if (wal != null && !shouldSyncLog()) {
//步骤2  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      wal.sync();
    }

    // wait for all in-progress transactions to commit to HLog before
    // we can start the flush. This prevents
    // uncommitted transactions from being written into HFiles.
    // We have to block before we start the flush, otherwise keys that
    // were removed via a rollbackMemstore could be written to Hfiles.
    mvcc.waitForRead(w);

    s = "Flushing stores of " + this;
    status.setStatus(s);
    if (LOG.isTraceEnabled()) LOG.trace(s);

    // Any failure from here on out will be catastrophic requiring server
    // restart so hlog content can be replayed and put back into the memstore.
    // Otherwise, the snapshot content while backed up in the hlog, it will not
    // be part of the current running servers state.
    boolean compactionRequested = false;
    try {
      // A.  Flush memstore to all the HStores.
      // Keep running vector of all store files that includes both old and the
      // just-made new flush store file. The new flushed file is still in the
      // tmp directory.

      for (StoreFlushContext flush : storeFlushCtxs) {
//步骤3   @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
        flush.flushCache(status);
      }

      // Switch snapshot (in memstore) -> new hfile (thus causing
      // all the store scanners to reset/reseek).
      for (StoreFlushContext flush : storeFlushCtxs) {
//步骤4   @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
        boolean needsCompaction = flush.commit(status);
        if (needsCompaction) {
          compactionRequested = true;
        }
      }
      storeFlushCtxs.clear();

      // Set down the memstore size by amount of flush.
      this.addAndGetGlobalMemstoreSize(-totalFlushableSize);
    } catch (Throwable t) {
      // An exception here means that the snapshot was not persisted.
      // The hlog needs to be replayed so its content is restored to memstore.
      // Currently, only a server restart will do this.
      // We used to only catch IOEs but its possible that we'd get other
      // exceptions -- e.g. HBASE-659 was about an NPE -- so now we catch
      // all and sundry.
      if (wal != null) {
        wal.abortCacheFlush(this.getRegionInfo().getEncodedNameAsBytes());
      }
      DroppedSnapshotException dse = new DroppedSnapshotException("region: " +
          Bytes.toStringBinary(getRegionName()));
      dse.initCause(t);
      status.abort("Flush failed: " + StringUtils.stringifyException(t));
      throw dse;
    }

    // If we get to here, the HStores have been written.
    if (wal != null) {
      wal.completeCacheFlush(this.getRegionInfo().getEncodedNameAsBytes());
    }

    // Record latest flush time
    this.lastFlushTime = EnvironmentEdgeManager.currentTimeMillis();

    // Update the last flushed sequence id for region
    completeSequenceId = flushSeqId;

    // C. Finally notify anyone waiting on memstore to clear:
    // e.g. checkResources().
    synchronized (this) {
      notifyAll(); // FindBugs NN_NAKED_NOTIFY
    }

    long time = EnvironmentEdgeManager.currentTimeMillis() - startTime;
    long memstoresize = this.memstoreSize.get();
    String msg = "Finished memstore flush of ~" +
      StringUtils.humanReadableInt(totalFlushableSize) + "/" + totalFlushableSize +
      ", currentsize=" +
      StringUtils.humanReadableInt(memstoresize) + "/" + memstoresize +
      " for region " + this + " in " + time + "ms, sequenceid=" + flushSeqId +
      ", compaction requested=" + compactionRequested +
      ((wal == null)? "; wal=null": "");
    LOG.info(msg);
    status.setStatus(msg);
    this.recentFlushes.add(new Pair<Long,Long>(time/1000, totalFlushableSize));

    return compactionRequested;
  }

调用HRegion的internalFlushcache方法

1.HRegion 1661 HStore 1941 prepare （获取写锁）用memStore类复制kvset生成snapshot作为本次mem flush的内存

（每次flush会触发region内的所有store的flush，所以flush的最小单位是region，不是store，这也是不太建议多个cf理由的一个原因）

2.HRegion 1674 调用wal 等待wal完成

3.HRegion 1700 HStore flushCache生成tmpfile（一个HStore一个tmpfile，虽然用的tmpfiles是个List）

在

4.HRegion 1706 HStore将新生成的tmpfiles封装为HStorefile，

HStore调用updateStorefiles方法，获得写锁添加到StoreFileManager的List中，提供服务，清空snapshot

HStore 951 needsCompaction方法，调用RatioBasedCompactionPolicy.needsCompaction方法，判断storm是否需要compact

（判断方法hfile数量大于hbase.hstore.compaction.min 和 hbase.hstore.compactionThreshold的最大值数（默认值为3））

--------------------------------------------------------------------------------------------------------------------

2. hregion查看是否split，实现类为split策略类：IncreasingToUpperBoundRegionSplitPolicy

  @Override
  protected boolean shouldSplit() {
    if (region.shouldForceSplit()) return true;
    boolean foundABigStore = false;
    // Get count of regions that have the same common table as this.region
    int tableRegionsCount = getCountOfCommonTableRegions();
    // Get size to check
    long sizeToCheck = getSizeToCheck(tableRegionsCount);

    for (Store store : region.getStores().values()) {
      // If any of the stores is unable to split (eg they contain reference files)
      // then don't split
      if ((!store.canSplit())) {
        return false;
      }

      // Mark if any store is big enough
      long size = store.getSize();
      if (size > sizeToCheck) {
        LOG.debug("ShouldSplit because " + store.getColumnFamilyName() +
          " size=" + size + ", sizeToCheck=" + sizeToCheck +
          ", regionsWithCommonTable=" + tableRegionsCount);
        foundABigStore = true;
      }
    }

    return foundABigStore;
  }

调用IncreasingToUpperBoundRegionSplitPolicy 65 shouldSplit方法，判断，这个region是否需要split

（又是以一个region查看是否需要split的，所以多个cf真的不好）

（（init）initialSize = hbase.increasing.policy.initial.size（预先设置初始值大小）或hbase.hregion.memstore.flush.size （memflush大小））

获取this.region所在表的所有region数 getCountOfCommonTableRegions 为regioncount

当regioncount在0到100之间，取配置hbase.hregion.max.filesize（默认10G）和initialSize*(regioncount^3)的最小值否则取配置hbase.hregion.max.filesize（默认10G）

如，只有一个region，128*1^3=128M

128*2^3=1024M

128*3^3=3456M

128*4^3=8192M

128*5^3=16000M(15G) => 10G 当有5个region就可以用配置了

--------------------------------------------------------------------------------------------------------------------

3.if(split) spliting elif(compact) compacting

http://blackproof.iteye.com/blog/2037159

之前做过笔记，自己都快忘了

又写了一份region split的

生成两个子region的代码： .stepsBeforePONR

 public PairOfSameType<HRegion> stepsBeforePONR(final Server server,
      final RegionServerServices services, boolean testing) throws IOException {
    // Set ephemeral SPLITTING znode up in zk.  Mocked servers sometimes don't
    // have zookeeper so don't do zk stuff if server or zookeeper is null
    if (server != null && server.getZooKeeper() != null) {
      try {
    	    //步骤1@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
        createNodeSplitting(server.getZooKeeper(),
          parent.getRegionInfo(), server.getServerName(), hri_a, hri_b);
      } catch (KeeperException e) {
        throw new IOException("Failed creating PENDING_SPLIT znode on " +
          this.parent.getRegionNameAsString(), e);
      }
    }
    this.journal.add(JournalEntry.SET_SPLITTING_IN_ZK);
    if (server != null && server.getZooKeeper() != null) {
      // After creating the split node, wait for master to transition it
      // from PENDING_SPLIT to SPLITTING so that we can move on. We want master
      // knows about it and won't transition any region which is splitting.
	    //步骤2@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      znodeVersion = getZKNode(server, services);
    }

    //步骤3@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    this.parent.getRegionFileSystem().createSplitsDir();
    this.journal.add(JournalEntry.CREATE_SPLIT_DIR);

    Map<byte[], List<StoreFile>> hstoreFilesToSplit = null;
    Exception exceptionToThrow = null;
    try{
	    //步骤4@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      hstoreFilesToSplit = this.parent.close(false);
    } catch (Exception e) {
      exceptionToThrow = e;
    }
    if (exceptionToThrow == null && hstoreFilesToSplit == null) {
      // The region was closed by a concurrent thread.  We can't continue
      // with the split, instead we must just abandon the split.  If we
      // reopen or split this could cause problems because the region has
      // probably already been moved to a different server, or is in the
      // process of moving to a different server.
      exceptionToThrow = closedByOtherException;
    }
    if (exceptionToThrow != closedByOtherException) {
      this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
    }
    if (exceptionToThrow != null) {
      if (exceptionToThrow instanceof IOException) throw (IOException)exceptionToThrow;
      throw new IOException(exceptionToThrow);
    }
    if (!testing) {
	    //步骤5@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      services.removeFromOnlineRegions(this.parent, null);
    }
    this.journal.add(JournalEntry.OFFLINED_PARENT);

    // TODO: If splitStoreFiles were multithreaded would we complete steps in
    // less elapsed time?  St.Ack 20100920
    //
    // splitStoreFiles creates daughter region dirs under the parent splits dir
    // Nothing to unroll here if failure -- clean up of CREATE_SPLIT_DIR will
    // clean this up.
    //步骤6@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    splitStoreFiles(hstoreFilesToSplit);

    // Log to the journal that we are creating region A, the first daughter
    // region.  We could fail halfway through.  If we do, we could have left
    // stuff in fs that needs cleanup -- a storefile or two.  Thats why we
    // add entry to journal BEFORE rather than AFTER the change.
    //步骤7@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    this.journal.add(JournalEntry.STARTED_REGION_A_CREATION);
    HRegion a = this.parent.createDaughterRegionFromSplits(this.hri_a);

    // Ditto
    this.journal.add(JournalEntry.STARTED_REGION_B_CREATION);
    HRegion b = this.parent.createDaughterRegionFromSplits(this.hri_b);
    return new PairOfSameType<HRegion>(a, b);
  }

1.RegionSplitPolicy.getSplitPoint()获得region split的split point ，最大store的中间点midpoint最为split point

2.SplitRequest.run()

实例化SplitTransaction

st.prepare()：split前准备：region是否关闭，所有hfile是否被引用

st.execute:执行split操作

1.createDaughters 创建两个region，获得parent region的写锁

1在zk上创建一个临时的node splitting point，

2等待master直到这个region转为splitting状态

3之后建立splitting的文件夹，

4等待region的flush和compact都完成后，关闭这个region

5从HRegionServer上移除，加入到下线region中

6进行regionsplit操作，创建线程池，用StoreFileSplitter类将region下的所有Hfile（StoreFile）进行split，

（split row在hfile中的不管，其他的都进行引用，把引用文件分别写到region下边）

7.生成左右两个子region，删除meta上parent，根据引用文件生成子region的regioninfo，写到hdfs上

2.stepsAfterPONR 调用DaughterOpener类run打开两个子region，调用initilize

a)向hdfs上写入.regionInfo文件以便meta挂掉以便恢复

b)初始化其下的HStore，主要是LoadStoreFiles函数：

对于该store函数会构造storefile对象，从hdfs上获取路径和文件，每个文件一个

storefile对象，对每个storefile对象会读取文件上的内容创建一个

HalfStoreFileReader读对象来操作该region的父region上的相应的文件，及该

region上目前存储的是引用文件，其指向的是其父region上的相应的文件，对该

region的所有读或写都将关联到父region上

将子Region添加到rs的online region列表上，并添加到meta表上

前端架构师具备什么能力？前端性能优化全链路指南 kerwin_1727 前端架构师具备什么能力前端性能优化
前端性能优化全链路指南——从构建到运行，让你的页面飞起来！一、性能优化全链路概览性能优化不是“一招鲜”，而是从构建时到运行时的全流程优化。以下是核心链路：构建时：减少打包体积（TreeShaking、CodeSplitting）。加载时：加速资源加载（懒加载、预加载）。运行时：提升渲染效率（虚拟列表、WebWorker）。监控与诊断：用工具定位问题（ChromePerformance、Lighth
java+redis+pipleline_详解Java使用Pipeline对Redis批量读写（hmset&hgetall） 666齐乐家园
一般情况下，RedisClient端发出一个请求后，通常会阻塞并等待Redis服务端处理，Redis服务端处理完后请求命令后会将结果通过响应报文返回给Client。感觉这有点类似于HBase的Scan，通常是Client端获取每一条记录都是一次RPC调用服务端。在Redis中，有没有类似HBaseScannerCaching的东西呢，一次请求，返回多条记录呢？有，这就是Pipline。官方介绍ht
Flume详解——介绍、部署与使用克里斯蒂亚诺罗纳尔多阿维罗 flume 大数据分布式
1.Flume简介ApacheFlume是一个专门用于高效地收集、聚合、传输大量日志数据的分布式、可靠的系统。它特别擅长将数据从各种数据源（如日志文件、消息队列等）传输到HDFS、HBase、Kafka等大数据存储系统。特点：可扩展：支持大规模数据传输，灵活扩展容错性：支持数据恢复和失败重试，确保数据不丢失多种数据源：支持日志文件、网络数据、HTTP请求、消息队列等多种来源流式处理：数据边收集边传
代码随想录算法训练营第六十五天| 图论10 Rachela_z 算法图论
Bellman_ford队列优化算法（又名SPFA）代码随想录importcollectionsdefmain():n,m=map(int,input().strip().split())edges=[[]for_inrange(n+1)]for_inrange(m):src,dest,weight=map(int,input().strip().split())edges[src].append
代码随想录算法训练营第六十六天| 图论11 Rachela_z 算法图论
Floyd算法精讲Floyd算法代码很简单，但真正理解起原理还是需要花点功夫，大家在看代码的时候，会发现Floyd的代码很简单，甚至看一眼就背下来了，但我为了讲清楚原理，本篇还是花了大篇幅来讲解。代码随想录if__name__=='__main__':max_int=10005#设置最大路径，因为边最大距离为10^4n,m=map(int,input().split())grid=[[[max_i
【蓝桥杯】省赛：连连看（暴力非AC）遥感小萌新蓝桥杯蓝桥杯职场和发展
对角线遍历每个元素的左下、右下对角线，检查是否值相等n,m=map(int,input().split())A=[]foriinrange(n):ls=list(map(int,input().split()))A.append(ls)cnt=0foriinrange(n):forjinrange(m):#zuoxiaforpinrange(1,min(n-1-i+1,j+1)):ifA[i+p]
【蓝桥杯】省赛：神奇闹钟遥感小萌新蓝桥杯蓝桥杯职场和发展
思路python做这题很简单，灵活用datetime库即可codeimportosimportsys#请在此输入您的代码importdatetimestart=datetime.datetime(1970,1,1,0,0,0)for_inrange(int(input())):ls=input().split()end=datetime.datetime.strptime(ls[0]+ls[1],
dhtmlxGantt 甘特图一行展示多条数据怡宝丶加冰甘特图
效果如图:后台拿到数据处理之后如图:含义:如上图所示,如果一行需要展示多个需要给父数据的那条添加render:split属性,子数据的parent为父数据的Id即可切记父数据的id别为0为0时会出现错乱因为有些小伙伴提出分段展示的数据结构还是有点问题,下面展示一个完整的demoimport{gantt}from'dhtmlx-gantt';import"dhtmlx-gantt/codebase/
构建高效RAG系统的常用策略背太阳的牧羊人 RAG+langchain RAG优化方法人工智能 RAG RAG优化自然语言处理数据处理
示例代码：代码1cleaning.py：defclean_text(text:str)->str:text=re.sub(r"[^\w\s.,!?]","",text)text=re.sub(r"\s+","",text)returntext.strip()代码2chunking.py：fromlangchain.text_splitterimportRecursiveCharacterTextS
【人工智能】【Python】在Scikit-Learn中使用决策树算法（ID3和CART） SmallBambooCode 机器学习人工智能 python 算法 scikit-learn 决策树机器学习 ai
importnumpyasnpimportmatplotlib.pyplotaspltfromsklearn.datasetsimportload_irisfromsklearn.model_selectionimporttrain_test_splitfromsklearn.treeimportDecisionTreeClassifier,plot_tree#加载数据集iris=load_iri
洛谷 P5534 【XR-3】等差数列 python 阿于阿于 xr
这题不用向下取整//就会错，不太能理解为什么...感觉对结果好像没什么影响啊a1,a2,n=map(int,input().split())d=a2-a1an=a1+d*(n-1)s=(a1+an)*n//2print(s)
Python正则表达式（re模块） qq742234984 python 正则表达式 mysql
Python正则表达式（re模块）概述正则表达式Python正则表达式re模块re.match方法常用的匹配规则-匹配字符常用的匹配规则-匹配字符数量常用的匹配规则-原生字符串常用的匹配规则-匹配开头结尾常用的匹配规则-分组匹配re.compile方法re.search方法re.findall方法re.sub方法re.split方法贪婪模式与非贪婪模式概述案例概述正则表达式英文名为RegularE
算法刷题汇总 python版本 lanlinbuaa python 算法 leetcode
OJ在线编程常见输入输出练习牛客网练习链接：https://ac.nowcoder.com/acm/contest/5657#question1.读取行数未知方法一：使用forlineinsys.stdinimportsysforlineinsys.stdin:a=line.split()#split()默认为所有的空字符，包括空格、换行(\n)、制表符(\t)等print(int(a[0])+i
hive split 函数转义问题进一步有进一步的欢喜 Hive SQL 精进系列大数据
语法split(strstring,regexstring)--使用regex分割字符串str基本用法selectsplit('a,b,c,d',',')fromtemp_cwh_test;--分割--结果为数组>["a","b","c","d"]截取字符串中某个值selectsplit('a,b,c,d',',')[0]fromtemp_cwh_test;--提取第1个值>a特殊字符的处理针对特
JVM GC四大算法 coding_-_半生 jvm 算法 java
JVMGC四大算法文章目录JVMGC四大算法GC四大算法一、引用计数法二、复制算法（COPY）三、标记清除算法（MARK-SWEEP）四、标记整理算法（MARK-COMPACT）五、总结GC四大算法一、引用计数法描述：给每一个对象分配一个计数器，用于记录对象是否被引用，被引用一次，计数进行+1优点：方便直接判断对象是否能够回收缺点：使用计数器需要消耗一定的内存，且每一次计数的修改同样需要消耗内存致
算法手撕面经系列(1)--手撕多头注意力机制夜半罟霖算法 python 深度学习
多头注意力机制一个简单的多头注意力模块可以分解为以下几个步骤：先不分多头，对输入张量分别做变换，得到Q,K,VQ,K,VQ,K,V对得到的Q,K,VQ,K,VQ,K,V按头的个数进行split；用Q,KQ,KQ,K计算向量点积考虑是否要添因果mask利softmax计算注意力得分矩阵atten对注意力得分矩阵施加Dropout将atten矩阵和VVV矩阵相乘再过一道最终的输出变换代码给出一个d
雷林鹏分享：Ruby 命令行选项 weixin_30839881 ruby
Ruby命令行选项Ruby一般是从命令行运行，方式如下：$ruby[options][.][programfile][arguments...]解释器可以通过下列选项被调用，来控制解释器的环境和行为。选项描述-a与-n或-p一起使用时，可以打开自动拆分模式(autosplitmode)。请查看-n和-p选项。-c只检查语法，不执行程序。-Cdir在执行前改变目录(等价于-X)。-d启用调试模式(等
Ruby学习之命令行选项&环境变量 luyaran 原创 Ruby学习之路 Ruby 命令行选项环境变量
这个ruby文件一般呢都是通过命令行来运行的，语法格式如下：ruby[options][.][programfile][arguments...]解释器可以通过下列选项被调用，来控制解释器的环境和行为，来看下具体数据：选项描述-a与-n或-p一起使用时，可以打开自动拆分模式(autosplitmode)。请查看-n和-p选项。-c只检查语法，不执行程序。-Cdir在执行前改变目录（等价于-X）。-
基于LangChain-Chatchat实现的RAG-本地知识库的问答应用[5]-高阶实战微调汀、人工智能 LLM工业级落地实践 LLM技术汇总 langchain 人工智能大模型推理大模型微调 p-tuning fastchat RAG
基于LangChain-Chatchat实现的RAG-本地知识库的问答应用[5]-高阶实战微调1.推荐的模型组合在默认的配置文件中，我们提供了以下模型组合LLM:Chatglm2-6bEmbeddingModels:m3e-baseTextSplitter:ChineseRecursiveTextSplitterKb_dataset:faiss我们推荐开发者根据自己的业务需求进行模型微调，如果不需
Python写一个脚本——30行代码——1秒实现PDF任意页码拆分穿梭的编织者 Python精选 pdf python
一、引入库importosfromPyPDF2importPdfReader,PdfWriter二、定义拆分方法defsplit_pdf(input_path,output_dir,ranges):ifnotos.path.exists(output_dir):os.makedirs(output_dir)withopen(input_path,'rb')asfile:pdf=PdfReader(
大数据学习（67）- Flume、Sqoop、Kafka、DataX对比 viperrrrrrr 大数据学习 flume kafka sqoop datax
大数据学习系列专栏：哲学语录:用力所能及，改变世界。如果觉得博主的文章还不错的话，请点赞+收藏⭐️+留言支持一下博主哦工具主要作用数据流向实时性数据源/目标应用场景Flume实时日志采集与传输从数据源到存储系统实时日志文件、网络流量等→HDFS、HBase、Kafka等日志收集、实时监控、实时分析Sqoop关系型数据库与Hadoop间数据同步关系型数据库→Hadoop生态系统（HDFS、Hive、
Oracle RAC 三种心跳机制数据库急诊日记故障处理 oracle 数据库 database linux java python c语言
在OracleRAC（RealApplicationClusters）中，心跳（Heartbeat）是集群节点间用于检测存活状态的核心机制，确保节点间的通信正常并避免脑裂（SplitBrain）问题。以下是RAC的三种关键心跳机制及其作用：1.网络心跳（NetworkHeartbeat）作用：通过私有网络互联（PrivateInterconnect）实时检测节点间的通信状态。工作原理：每个节点周期
Java字符串以“.“分割 slient_love 软件开发
今天开发中需要对图片地址进行拼接，使用字符串分割函数split()进行处理，发现数组为空：Stringtemp=image[i];String[]str=temp.split(".");在百度之后发现，需要使用转义字符。测试成功~String[]str=temp.split("\\.");注意哦，在正则表达式中，小数点指的是任意字符，因此不能直接用".“来匹配小数点，需要使用”\\."来匹配小数点
HBase学习二：HBase的表结构 hucs420109 HBase HBase
HBase的表结构初次接触HBase，可能看到以下描述会懵：“基于列存储”，“稀疏MAP”，“RowKey”,“ColumnFamily”。其实没那么高深，我们需要分两步来理解HBase,就能够理解为什么HBase能够“快速地”“分布式地”处理“大量数据”了。内存结构文件存储结构先介绍几个名称概念行键RowKey：行键，类似mysql中的主键，Table中的记录按照RowKey排序，行键是表结构的
分布式存储—— HBase数据模型详解 Future_yzx 分布式 hbase 数据库
目录1.3HBase数据模型1.3.1两类数据模型1.3.2数据模型的重要概念1.3.3数据模型的操作1.3.4数据模型的特殊属性1.3.5CAP原理与最终一致性1.3.6小结本文章参考、总结于学校教材课本《HBase开发与应用》1.3HBase数据模型在开始学习HBase之前非常有必要先学习HBase的特性，因此本节将介绍HBase的逻辑模型、物理模型和访问HBase的方法等。和传统的关系型数据
分布式存储学习——HBase表结构设计 Future_yzx oracle 数据库
目录1.4.1模式创建1.4.2Rowkey设计1.4.3列族定义1.4.3.1可配置的数据块大小1.4.3.2数据块缓存1.4.3.3布隆过滤器1.4.3.4数据压缩1.4.3.5单元时间版本1.4.3.6生存时间1.4.4模式设计实例1.4.4.1实例1：动物分类1.4.4.2实例2：店铺与商品1.4.4.3实例3：网上商城用户消费记录1.4.4.4实例4：微博用户与粉丝1.4.4.5小结本文
HBase2.6.1部署文档 CXH728 zookeeper hbase
1、HBase概述ApacheHBase是基于Hadoop分布式文件系统（HDFS）之上的分布式、列存储、NoSQL数据库。它适合处理结构化和半结构化数据，能够存储数十亿行和数百万列的数据，并支持实时读写操作。HBase通常应用于需要快速随机读写、低延迟访问以及高吞吐量的场景，例如大规模日志处理、社交网络数据存储等。HBase特性列存储模型：HBase的数据是按列族存储的，适合高稀疏数据。行键分区
使用Couchbase中的向量搜索进行智能查询 eahba python
技术背景介绍Couchbase是一种强大的分布式NoSQL数据库，广泛应用于云、移动、AI和边缘计算应用中。其向量搜索功能，作为全文搜索服务的一部分，支持在应用中进行高效的语义查询。这为开发者在实现AI驱动的应用时提供了极大的便利。核心原理解析Couchbase的向量搜索利用向量嵌入技术对文本进行处理，可以实现基于语义相似度的查询。这与传统的关键词匹配有根本的不同，更适合AI应用场景中模糊或语义相
Bert的使用巨鹿.. 深度学习记录 bert 人工智能深度学习
一、Data.py#data负责产生两个dataloaderfromtorch.utils.dataimportDataLoader,Datasetfromsklearn.model_selectionimporttrain_test_split#给X,Y和分割比例，分割出来一个训练集和验证机的X,Yimporttorchdefread_file(path):data=[]label=[]with
Zookeeper与Kafka学习笔记上海研博数据 zookeeper kafka 学习
一、Zookeeper核心要点1.核心特性分布式协调服务，用于维护配置/命名/同步等元数据采用层次化数据模型（Znode树结构），每个节点可存储<1MB数据典型应用场景：HadoopNameNode高可用HBase元数据管理Kafka集群选举与状态管理2.设计限制内存型存储，不适合大数据量场景数据变更通过版本号（Version）控制，实现乐观锁机制采用ZAB协议保证数据一致性二、Kafka核心架构
jsonp 常用util方法 hw1287789687 jsonp jsonp常用方法 jsonp callback
jsonp 常用java方法 (1)以jsonp的形式返回:函数名(json字符串) /*** * 用于jsonp调用 * @param map : 用于构造json数据 * @param callback : 回调的javascript方法名 * @param filters : <code>SimpleBeanPropertyFilter theFilt
多线程场景 alafqq 多线程
0 能不能简单描述一下你在java web开发中需要用到多线程编程的场景？0 对多线程有些了解，但是不太清楚具体的应用场景，能简单说一下你遇到的多线程编程的场景吗？ Java多线程 2012年11月23日 15:41 Young9007 Young9007 4 0 0 4 Comment添加评论关注(2) 3个答案按时间排序按投票排序 0 0 最典型的如： 1、
Maven学习——修改Maven的本地仓库路径 Kai_Ge maven
安装Maven后我们会在用户目录下发现.m2 文件夹。默认情况下，该文件夹下放置了Maven本地仓库.m2/repository。所有的Maven构件(artifact)都被存储到该仓库中，以方便重用。但是windows用户的操作系统都安装在C盘，把Maven仓库放到C盘是很危险的，为此我们需要修改Maven的本地仓库路径。
placeholder的浏览器兼容 120153216 placeholder
【前言】自从html5引入placeholder后，问题就来了，不支持html5的浏览器也先有这样的效果，各种兼容，之前考虑，今天测试人员逮住不放，想了个解决办法，看样子还行，记录一下。【原理】不使用placeholder，而是模拟placeholder的效果，大概就是用focus和focusout效果。【代码】 <scrip
debian_用iso文件创建本地apt源 2002wmj Debian
1.将N个debian-506-amd64-DVD-N.iso存放于本地或其他媒介内，本例是放在本机/iso/目录下 2.创建N个挂载点目录如下： debian:~#mkdir –r /media/dvd1 debian:~#mkdir –r /media/dvd2 debian:~#mkdir –r /media/dvd3 …. debian:~#mkdir –r /media
SQLSERVER耗时最长的SQL 357029540 SQL Server
对于DBA来说，经常要知道存储过程的某些信息： 1. 执行了多少次 2. 执行的执行计划如何 3. 执行的平均读写如何 4. 执行平均需要多少时间列名 &
com/genuitec/eclipse/j2eedt/core/J2EEProjectUtil 7454103 eclipse
今天eclipse突然报了com/genuitec/eclipse/j2eedt/core/J2EEProjectUtil 错误，并且工程文件打不开了，在网上找了一下资料，然后按照方法操作了一遍，好了，解决方法如下：错误提示信息： An error has occurred.See error log for more details. Reason: com/genuitec/
用正则删除文本中的html标签 adminjun java html 正则表达式去掉html标签
使用文本编辑器录入文章存入数据中的文本是HTML标签格式，由于业务需要对HTML标签进行去除只保留纯净的文本内容，于是乎Java实现自动过滤。如下： public static String Html2Text(String inputString) { String htmlStr = inputString; // 含html标签的字符串 String textSt
嵌入式系统设计中常用总线和接口 aijuans linux 基础
嵌入式系统设计中常用总线和接口任何一个微处理器都要与一定数量的部件和外围设备连接，但如果将各部件和每一种外围设备都分别用一组线路与CPU直接连接，那么连线
Java函数调用方式——按值传递 ayaoxinchao java 按值传递对象基础数据类型
Java使用按值传递的函数调用方式，这往往使我感到迷惑。因为在基础数据类型和对象的传递上，我就会纠结于到底是按值传递，还是按引用传递。其实经过学习，Java在任何地方，都一直发挥着按值传递的本色。首先，让我们看一看基础数据类型是如何按值传递的。 public static void main(String[] args) { int a = 2;
ios音量线性下降 bewithme ios音量
直接上代码吧 //second 几秒内下降为0 - (void)reduceVolume:(int)second { KGVoicePlayer *player = [KGVoicePlayer defaultPlayer]; if (!_flag) { _tempVolume = player.volume;
与其怨它不如爱它 bijian1013 选择理想职业规划
抱怨工作是年轻人的常态，但爱工作才是积极的心态，与其怨它不如爱它。一般来说，在公司干了一两年后，不少年轻人容易产生怨言，除了具体的埋怨公司“扭门”，埋怨上司无能以外，也有许多人是因为根本不爱自已的那份工作，工作完全成了谋生的手段，跟自已的性格、专业、爱好都相差甚远。
一边时间不够用一边浪费时间 bingyingao 工作时间浪费
一方面感觉时间严重不够用，另一方面又在不停的浪费时间。每一个周末，晚上熬夜看电影到凌晨一点，早上起不来一直睡到10点钟，10点钟起床，吃饭后玩手机到下午一点。精神还是很差，下午像一直野鬼在城市里晃荡。为何不尝试晚上10点钟就睡，早上7点就起，时间完全是一样的，把看电影的时间换到早上，精神好，气色好，一天好状态。控制让自己周末早睡早起，你就成功了一半。有多少个工作
【Scala八】Scala核心二：隐式转换 bit1129 scala
Implicits work like this: if you call a method on a Scala object, and the Scala compiler does not see a definition for that method in the class definition for that object, the compiler will try to con
sudoku slover in Haskell (2) bookjovi haskell sudoku
继续精简haskell版的sudoku程序，稍微改了一下，这次用了8行，同时性能也提高了很多，对每个空格的所有解不是通过尝试算出来的，而是直接得出。 board = [0,3,4,1,7,0,5,0,0, 0,6,0,0,0,8,3,0,1, 7,0,0,3,0,0,0,0,6, 5,0,0,6,4,0,8,0,7,
Java-Collections Framework学习与总结-HashSet和LinkedHashSet BrokenDreams linkedhashset
本篇总结一下两个常用的集合类HashSet和LinkedHashSet。它们都实现了相同接口java.util.Set。Set表示一种元素无序且不可重复的集合；之前总结过的java.util.List表示一种元素可重复且有序
读《研磨设计模式》-代码笔记-备忘录模式-Memento bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; /* * 备忘录模式的功能是，在不破坏封装性的前提下，捕获一个对象的内部状态，并在对象之外保存这个状态，为以后的状态恢复作“备忘”
《RAW格式照片处理专业技法》笔记 cherishLC PS
注意，这不是教程！仅记录楼主之前不太了解的一、色彩（空间）管理作者建议采用ProRGB（色域最广），但camera raw中设为ProRGB，而PS中则在ProRGB的基础上，将gamma值设为了1.8（更符合人眼）注意：bridge、camera raw怎么设置显示、输出的颜色都是正确的（会读取文件内的颜色配置文件），但用PS输出jpg文件时，必须先用Edit->conv
使用 Git 下载 Spring 源码编译 for Eclipse crabdave eclipse
使用 Git 下载 Spring 源码编译 for Eclipse 1、安装gradle，下载 http://www.gradle.org/downloads 配置环境变量GRADLE_HOME，配置PATH %GRADLE_HOME%/bin，cmd，gradle -v 2、spring4 用jdk8 下载 https://jdk8.java.
mysql连接拒绝问题 daizj mysql 登录权限
mysql中在其它机器连接mysql服务器时报错问题汇总一、[running][email protected]:~$mysql -uroot -h 192.168.9.108 -p //带-p参数，在下一步进行密码输入 Enter password: //无字符串输入 ERROR 1045 (28000): Access
Google Chrome 为何打压 H.264 dsjt apple html5 chrome Google
Google 今天在 Chromium 官方博客宣布由于 H.264 编解码器并非开放标准，Chrome 将在几个月后正式停止对 H.264 视频解码的支持，全面采用开放的 WebM 和 Theora 格式。 Google 在博客上表示，自从 WebM 视频编解码器推出以后，在性能、厂商支持以及独立性方面已经取得了很大的进步，为了与 Chromium 现有支持的編解码器保持一致，Chrome
yii 获取控制器名和方法名 dcj3sjt126com yii framework
1. 获取控制器名在控制器中获取控制器名: $name = $this->getId(); 在视图中获取控制器名: $name = Yii::app()->controller->id; 2. 获取动作名在控制器beforeAction()回调函数中获取动作名: $name =
Android知识总结（二） come_for_dream android
明天要考试了，速速总结如下 1、Activity的启动模式 standard：每次调用Activity的时候都创建一个（可以有多个相同的实例，也允许多个相同Activity叠加。） singleTop：可以有多个实例，但是不允许多个相同Activity叠加。即，如果Ac
高洛峰收徒第二期：寻找未来的“技术大牛” ——折腾一年，奖励20万元 gcq511120594 工作项目管理
高洛峰，兄弟连IT教育合伙人、猿代码创始人、PHP培训第一人、《细说PHP》作者、软件开发工程师、《IT峰播》主创人、PHP讲师的鼻祖！首期现在的进程刚刚过半，徒弟们真的很棒，人品都没的说，团结互助，学习刻苦，工作认真积极，灵活上进。我几乎会把他们全部留下来，现在已有一多半安排了实际的工作，并取得了很好的成绩。等他们出徒之日，凭他们的能力一定能够拿到高薪，而且我还承诺过一个徒弟，当他拿到大学毕
linux expect heipark expect
1. 创建、编辑文件go.sh #!/usr/bin/expect spawn sudo su admin expect "*password*" { send "13456\r\n" } interact 2. 设置权限 chmod u+x go.sh 3.
Spring4.1新特性——静态资源处理增强 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
idea ubuntuxia 乱码 liyonghui160com
1.首先需要在windows字体目录下或者其它地方找到simsun.ttf 这个字体文件。 2.在ubuntu 下可以执行下面操作安装该字体： sudo mkdir /usr/share/fonts/truetype/simsun sudo cp simsun.ttf /usr/share/fonts/truetype/simsun fc-cache -f -v
改良程序的11技巧 pda158 技巧
有很多理由都能说明为什么我们应该写出清晰、可读性好的程序。最重要的一点，程序你只写一次，但以后会无数次的阅读。当你第二天回头来看你的代码时，你就要开始阅读它了。当你把代码拿给其他人看时，他必须阅读你的代码。因此，在编写时多花一点时间，你会在阅读它时节省大量的时间。让我们看一些基本的编程技巧：尽量保持方法简短永远永远不要把同一个变量用于多个不同的
300个涵盖IT各方面的免费资源（下）——工作与学习篇 shoothao 创业免费资源学习课程远程工作
工作与生产效率: A. 背景声音 Noisli:背景噪音与颜色生成器。 Noizio:环境声均衡器。 Defonic:世界上任何的声响都可混合成美丽的旋律。 Designers.mx:设计者为设计者所准备的播放列表。 Coffitivity:这里的声音就像咖啡馆里放的一样。 B. 避免注意力分散 Self Co
深入浅出RPC uule rpc
深入浅出RPC-浅出篇深入浅出RPC-深入篇 RPC Remote Procedure Call Protocol 远程过程调用协议它是一种通过网络从远程计算机程序上请求服务，而不需要了解底层网络技术的协议。RPC协议假定某些传输协议的存在，如TCP或UDP，为通信程序之间携带信息数据。在OSI网络通信模型中，RPC跨越了传输层和应用层。RPC使得开发

HBase memflush源码分析

你可能感兴趣的:(hbase,split,memflush,compact)