hbase snapshot源码分析

snapshot操作在硬盘上形式:

/hbase/.snapshots
       /.tmp                <---- working directory
       /[snapshot name]     <----- completed snapshot

当snapshot完成时的形式展示:

     /hbase/.snapshots/[snapshot name]
                .snapshotinfo          <--- Description of the snapshot
                .tableinfo             <--- Copy of the tableinfo
               /.logs
                     /[server_name]
                         /... [log files]
                      ...
                /[region name]           <---- All the region's information
                .regioninfo              <---- Copy of the HRegionInfo
                   /[column family name]
                       /[hfile name]     <--- name of the hfile in the real region
                       ...
                   ...

snapshot基本步骤:

1.执行前会枷锁操作,不允许删除添加操作;

2.在hdfs在创建指定目录,写入相关的信息进去;

3.刷新memstore中的数据到hfile,

4.为hfile文件创建引用指针.

以下是大体的代码流程。

hbaseAdmin执行发起的snapshot:

    public void snapshot(final String snapshotName, final TableName tableName, SnapshotDescription.Type type) throws IOException,       SnapshotCreationException, IllegalArgumentException {
        SnapshotDescription.Builder builder = SnapshotDescription.newBuilder();
        builder.setTable(tableName.getNameAsString());
        builder.setName(snapshotName);
        builder.setType(type);
        snapshot(builder.build());
    }

执行快照并等待服务器完成该快照(阻止)。HBase实例一次只能有一个快照,或者结果可能是未定义(你可以告诉多个HBase集群同时快照,但只有一个在单个群集同时)。

    public void snapshot(SnapshotDescription snapshot) throws IOException, SnapshotCreationException, IllegalArgumentException {
        // actually take the snapshot
        SnapshotResponse response = takeSnapshotAsync(snapshot);

MasterRpcService:异步触发并完成一次snapshot:

        `master.snapshotManager.takeSnapshot(snapshot);`

SnapshotManager类:完成一次snapshot需要根据表的状态:disabled或者enabled

    if (assignmentMgr.getTableStateManager().isTableState(snapshotTable, ZooKeeperProtos.Table.State.ENABLED)) {
            LOG.debug("Table enabled, starting distributed snapshot.");
            snapshotEnabledTable(snapshot);
            LOG.debug("Started snapshot: " + ClientSnapshotDescriptionUtils.toString(snapshot));
        }
        // For disabled table, snapshot is created by the master
        else if (assignmentMgr.getTableStateManager().isTableState(snapshotTable, ZooKeeperProtos.Table.State.DISABLED)) {
            LOG.debug("Table is disabled, running snapshot entirely on master.");
            snapshotDisabledTable(snapshot);
            LOG.debug("Started snapshot: " + ClientSnapshotDescriptionUtils.toString(snapshot));
        } 

        private synchronized void snapshotEnabledTable(SnapshotDescription snapshot) throws HBaseSnapshotException {
        // setup the snapshot
        prepareToTakeSnapshot(snapshot);

        // Take the snapshot of the enabled table
        EnabledTableSnapshotHandler handler = new EnabledTableSnapshotHandler(snapshot, master, this);
        snapshotTable(snapshot, handler);
    }

enabled状态下执行表的snapshot:

        // setup the snapshot
        准备工作
        prepareToTakeSnapshot(snapshot);

        // Take the snapshot of the enabled table
        EnabledTableSnapshotHandler handler = new EnabledTableSnapshotHandler(snapshot, master, this);
        开始执行snapshot
        snapshotTable(snapshot, handler);
    }

snapshot开始之前的设置准备:检查是否有一个在运行的snapshot工作以及还原snapshot工作的请求存在。#

        // make sure we aren't already running a snapshot 
        if (isTakingSnapshot(snapshot)) {
            SnapshotSentinel handler = this.snapshotHandlers.get(snapshotTable);
            throw new SnapshotCreationException("Rejected taking " + ClientSnapshotDescriptionUtils.toString(snapshot) + " because we are already running another snapshot " + (handler != null ? ("on the same table " + ClientSnapshotDescriptionUtils.toString(handler.getSnapshot())) : "with the same name"), snapshot);
        }

        // make sure we aren't running a restore on the same table
        if (isRestoringTable(snapshotTable)) {
            SnapshotSentinel handler = restoreHandlers.get(snapshotTable);
            throw new SnapshotCreationException("Rejected taking " + ClientSnapshotDescriptionUtils.toString(snapshot) + " because we are already have a restore in progress on the same snapshot " + ClientSnapshotDescriptionUtils.toString(handler.getSnapshot()), snapshot);
        }

        try {
            // delete the working directory, since we aren't running the snapshot. Likely leftovers
            // from a failed attempt.
            fs.delete(workingDir, true);

            // recreate the working directory for the snapshot
            if (!fs.mkdirs(workingDir)) {
                throw new SnapshotCreationException("Couldn't create working directory (" + workingDir + ") for snapshot", snapshot);
            }

设置准备工作完成就开始进行snapshot用指定的handler进行snapshot工作:

            handler.prepare();
            this.executorService.submit(handler);
            this.snapshotHandlers.put(TableName.valueOf(snapshot.getTable()), handler);
            ...

TakeSnapshotHandler真正开始处理snapshot操作:

1.将snapshot描述信息写入.snapshotinfo目录

FsPermission perms = FSUtils.getFilePermissions(fs, fs.getConf(), HConstants.DATA_FILE_UMASK_KEY);
        Path snapshotInfo = new Path(workingDir, SnapshotDescriptionUtils.SNAPSHOTINFO_FILE);
        try {
            FSDataOutputStream out = FSUtils.create(fs, snapshotInfo, perms, true);
            try {
                snapshot.writeTo(out);
            } finally {
                out.close();
            }
        }

2.复制表的信息:

snapshotManifest.addTableDescriptor(this.htd);

3.获取hregionserver上的regions以及位置信息 ##:

List> regionsAndLocations;
            if (TableName.META_TABLE_NAME.equals(snapshotTable)) {
                regionsAndLocations = new MetaTableLocator().getMetaRegionsAndLocations(server.getZooKeeper());
            } else {
                regionsAndLocations = MetaTableAccessor.getTableRegionsAndLocations(server.getZooKeeper(), server.getConnection(), snapshotTable, false);
            }

4.开始执行snapshot操作,上面获取到的region信息及位置信息

 // run the snapshot
snapshotRegions(regionsAndLocations);
启动snapshot程序:::

在regionserver上开始snapshot // start the snapshot on the RS所有的snapshot操作的具体细节

    Procedure proc = coordinator.startProcedure(this.monitor, this.snapshot.getName(), this.snapshot.toByteArray(), 

    Lists.newArrayList(regionServers));
    if (proc == null) {
        String msg = "Failed to submit distributed procedure for snapshot '" + snapshot.getName() + "'";
        LOG.error(msg);
        throw new HBaseSnapshotException(msg);
    }

等待snapshot完成:

proc.waitForCompleted();

将下线的region作为disabled处理

// Take the offline regions as disabled
        for (Pair region : regions) {
            HRegionInfo regionInfo = region.getFirst();
            if (regionInfo.isOffline() && (regionInfo.isSplit() || regionInfo.isSplitParent())) {
                LOG.info("Take disabled snapshot of offline region=" + regionInfo);
                snapshotDisabledRegion(regionInfo);
            }
        }

5.相关region信息以及servername,用来验证snapshot的有效性

// extract each pair to separate lists
            Set serverNames = new HashSet();
            for (Pair p : regionsAndLocations) {
                if (p != null && p.getFirst() != null && p.getSecond() != null) {
                    HRegionInfo hri = p.getFirst();
                    if (hri.isOffline() && (hri.isSplit() || hri.isSplitParent()))
                        continue;
                    serverNames.add(p.getSecond().toString());
                }
            }

6.刷新内存状态,写snapshot-mnifest信息到目录

// flush the in-memory state, and write the single manifest
            status.setStatus("Consolidate snapshot: " + snapshot.getName());
            snapshotManifest.consolidate();

7.开始验证snapshot的有效性

// verify the snapshot is valid
            status.setStatus("Verifying snapshot: " + snapshot.getName());
            verifier.verifySnapshot(this.workingDir, serverNames);

8.完成snapshot,转移目录等

// complete the snapshot, atomically moving from tmp to .snapshot dir.
completeSnapshot(this.snapshotDir, this.workingDir, this.fs);
msg = "Snapshot " + snapshot.getName() + " of table " + snapshotTable + " completed";
status.markComplete(msg);
LOG.info(msg);
metricsSnapshot.addSnapshot(status.getCompletionTimestamp() - status.getStartTime());

你可能感兴趣的:(hbase snapshot源码分析)