让我们从创建表开始探索hbase内部机制。假设hbase.root目录为/new,测试表名为t1。
client使用HBaseAdmin的createTable接口,过程如下
1. 建立HMasterRPC连接,并调用之,由于hmaster端创建table是异步的,所以这里是一个异步操作。如果不指定split规则,默认会创建一个空region。
getMaster().createTable(desc, splitKeys);
2. client线程全表扫描meta表,检查t1表的region是否都分配好。默认重试100次,每次失败sleep。
MetaScannerVisitor visitor = new MetaScannerVisitorBase() { @Override public boolean processRow(Result rowResult) throws IOException { HRegionInfo info = Writables.getHRegionInfoOrNull( rowResult.getValue(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER)); ...... //拿'server'列,如果有值,则认为分配成功 byte [] value = rowResult.getValue(HConstants.CATALOG_FAMILY, HConstants.SERVER_QUALIFIER); // Make sure that regions are assigned to server if (value != null && value.length > 0) { hostAndPort = Bytes.toString(value); } if (!(info.isOffline() || info.isSplit()) && hostAndPort != null) { actualRegCount.incrementAndGet(); } return true; } }; MetaScanner.metaScan(conf, visitor, desc.getName());
来看HMaster的create table RCP接口
1.构造CreateTableHandler
1.1 等待META表就位,如果就位,则获取META表第一个region的location,并建立RPC连接
public ServerName waitForMeta(long timeout) throws InterruptedException, IOException, NotAllMetaRegionsOnlineException { long stop = System.currentTimeMillis() + timeout; long waitTime = Math.min(50, timeout); synchronized (metaAvailable) { while(!stopped && (timeout == 0 || System.currentTimeMillis() < stop)) { if (getMetaServerConnection() != null) { return metaLocation; } // perhaps -ROOT- region isn't available, let us wait a bit and retry. metaAvailable.wait(waitTime); } if (getMetaServerConnection() == null) { throw new NotAllMetaRegionsOnlineException("Timed out (" + timeout + "ms)"); } return metaLocation; } }
1.2 判断t1表是否已存在
1.3 创建并设置t1表在zk中的节点状态为‘enabling’,节点路径/hbase/table/t1
private void setTableState(final String tableName, final TableState state) throws KeeperException { String znode = ZKUtil.joinZNode(this.watcher.tableZNode, tableName); if (ZKUtil.checkExists(this.watcher, znode) == -1) { ZKUtil.createAndFailSilent(this.watcher, znode); } synchronized (this.cache) { ZKUtil.setData(this.watcher, znode, Bytes.toBytes(state.toString())); this.cache.put(tableName, state); } }
2.异步提交CreateTableHandler
this.executorService.submit(new CreateTableHandler(this, this.fileSystemManager, this.serverManager, hTableDescriptor, conf, newRegions, catalogTracker, assignmentManager));
3.CreateTableHandler运行
3.1 将table的元信息写入HDFS下的.tableinfo文件中,文件目录/new/t1/.tableinfo.0000000001。
private static Path writeTableDescriptor(final FileSystem fs, final HTableDescriptor hTableDescriptor, final Path tableDir, final FileStatus status) throws IOException { // Get temporary dir into which we'll first write a file to avoid // half-written file phenomeon. //先写tmp目录 Path tmpTableDir = new Path(tableDir, ".tmp"); //顺序号,从0开始 int currentSequenceid = status == null? 0: getTableInfoSequenceid(status.getPath()); int sequenceid = currentSequenceid; // Put arbitrary upperbound on how often we retry int retries = 10; int retrymax = currentSequenceid + retries; Path tableInfoPath = null; do { sequenceid += 1; //HDFS文件名,类是.tableinfo.0000000001 Path p = getTableInfoFileName(tmpTableDir, sequenceid); if (fs.exists(p)) { LOG.debug(p + " exists; retrying up to " + retries + " times"); continue; } try { //写内容 writeHTD(fs, p, hTableDescriptor); tableInfoPath = getTableInfoFileName(tableDir, sequenceid); //重命名成最终文件 if (!fs.rename(p, tableInfoPath)) { throw new IOException("Failed rename of " + p + " to " + tableInfoPath); } } ....... break; } while (sequenceid < retrymax); return tableInfoPath; }
3.2 创建region
HRegion region = HRegion.createHRegion(newRegion, this.fileSystemManager.getRootDir(), this.conf, this.hTableDescriptor, null, false, true);
3.3 META表新增记录,写入regioninfo列信息
private static Put makePutFromRegionInfo(HRegionInfo regionInfo) throws IOException { Put put = new Put(regionInfo.getRegionName()); put.add(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER, Writables.getBytes(regionInfo)); return put; }
3.4 close region
3.5 从zk获取活着的region server
//从/hbase/rs下获取并过滤掉那些dead的机器 Listservers = serverManager.getOnlineServersList(); // Remove the deadNotExpired servers from the server list. assignmentManager.removeDeadNotExpiredServers(servers);
3.6 region分配,默认随机均匀分配,使用多线程批量分配,业务线程等待直到所有region都分配成功,详细的分配过程将在下一篇介绍
this.assignmentManager.assignUserRegions(Arrays.asList(newRegions), servers);
3.7 设置t1表在zk中的节点状态为‘enabled’,节点路径/hbase/table/t1
小节
create table主要涉及,table元数据写入,region分配,zk状态信息修改,meta表修改和检查。