租约管理是在客户端写文件时用到的一种机制,当客户端打文件进行写操作是要先申请租约,该租约由LeaseManager来管理,说通俗点,客户端写文件不能无时间限制的写,打开文件持续太长会影响其他用户的使用,这时就要有个机制来限制,在HDFS中采用的租约机制,看下一个lease的成员变量就明白了:
private final String holder; //持有者,类似DFSClient_1960866591的字符串 private long lastUpdate; //最后一次更新的时间戳 private final Collection<String> paths = new TreeSet<String>(); //该lease所包含的文件名,如/a.txt这种字符串
当LeaseManager$Monitor循环检测租约时,判断是否会超出硬限制,如果没超出则直接返回,不做其他操作,如果超出则进行内部释放,通过fsnamesystem.internalReleaseLeaseOne(oldest,p);来实现。
为了有个感性认识,我们看下有关租约的日志,下面这些日志是记录在Namenode日志中的:
13/08/2612:11:00 INFO hdfs.StateChange: BLOCK* NameSystem.addToInvalidates:blk_5158458134414014528 is added to invalidSet of 192.168.0.43:50010 13/08/2612:11:00 INFO namenode.FSNamesystem: Number of transactions: 1 Total time fortransactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs:0 SyncTimes(ms): 0 0 13/08/2612:11:00 INFO hdfs.StateChange: BLOCK* ask 192.168.0.43:50010 to delete blk_5158458134414014528_1018 13/08/2613:12:40 INFO namenode.LeaseManager: Lease [Lease. Holder: DFSClient_1960866591, pendingcreates:1] has expired hard limit 13/08/2613:13:42 INFO namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_1960866591, pendingcreates:1], src=/cat1.txt 13/08/2613:14:21 INFO hdfs.StateChange: Removing lease on file /cat1.txt from clientDFSClient_1960866591 13/08/2613:14:29 WARN hdfs.StateChange: BLOCK* internalReleaseLease: No blocks found,lease removed for /cat1.txt 13/08/2613:15:44 INFO hdfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeatfrom 192.168.0.43:50010 13/08/2613:15:44 INFO net.NetworkTopology: Removing a node:/default-rack/192.168.0.43:50010
可以看到在13/08/26 13:12:40触发了lease的硬限制,下面日志显示尝试恢复这个租约,但检测结果为打开的这个文件有0个块(因为我测试的时候是先创建文件然后等待超时),所以失败,最后删除这个租约,后面是一个datanode心跳超时报的错误,暂不关心。
然后再看客户端日志:
13/08/2613:15:44 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No leaseon /cat1.txt File is not open for writing. Holder DFSClient_1960866591 does nothave any open files. atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1639) atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1622) atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1538) atorg.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) atorg.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) atorg.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) atorg.apache.hadoop.ipc.Server$Handler.run(Server.java:1383)
可以看到,客户端睡醒后再写入文件时已经报了租约超时的异常,最终这个写入没有成功,有了这些日志后我们看下源码.
这是个LeaseManager的内部类,周期性的检测租约信息,如果发现超时的则会删除
class Monitor implements Runnable { final String name = getClass().getSimpleName(); /** Check leases periodically. */ public void run() { for(; fsnamesystem.isRunning(); ) { synchronized(fsnamesystem) { checkLeases(); } try { Thread.sleep(2000); //检测周期为2秒 } catch(InterruptedException ie) { if (LOG.isDebugEnabled()) { LOG.debug(name + " is interrupted", ie); } } } } }
上面的日志中我们看到有个硬限制hard limit,其实在检测的时候有两个限制:
public static final long LEASE_SOFTLIMIT_PERIOD = 60 *1000; //1分钟 public static final long LEASE_HARDLIMIT_PERIOD= 60 * LEASE_SOFTLIMIT_PERIOD;//一小时
在Monitor类中主要检测函数为checkLeases();,进入这个源码看看
synchronized void checkLeases() { for(; sortedLeases.size()> 0; ) { final Lease oldest = sortedLeases.first(); //检测最老租约,如果他没过时,则其他的一定没过时,因为租约存放在sortedLeases中,是排序过的 if (!oldest.expiredHardLimit()) { return; } LOG.info("Lease " + oldest + " has expired hard limit"); final List<String> removing = new ArrayList<String>(); // need to create a copy of the oldest lease paths, becuase // internalReleaseLease() removes paths corresponding toempty files, // i.e. it needs to modify the collection being iteratedover // causing ConcurrentModificationException String[] leasePaths = new String[oldest.getPaths().size()]; oldest.getPaths().toArray(leasePaths); for(String p : leasePaths) { try { //开始内部释放,注意这个函数是在FSNameSystem中的,稍后我们还会分析这个函数 fsnamesystem.internalReleaseLeaseOne(oldest, p); } catch (IOException e) { LOG.error("Cannot release the path "+p+" in the lease "+oldest, e); removing.add(p); } } //真正开始删除,租约与文件路径的对应关系是放在sortedLeasesByPath中的,为SortedMap<String, Lease> for(String p : removing) { removeLease(oldest, p); } } }
下面分析内部释放函数fsnamesystem.internalReleaseLeaseOne(oldest, p)
void internalReleaseLeaseOne(Lease lease, String src) throws IOException { assert Thread.holdsLock(this); LOG.info("Recovering lease=" + lease + ", src=" + src); INodeFile iFile = dir.getFileINode(src); //获得文件为空的情况 if (iFile == null) { final String message = "DIR*NameSystem.internalReleaseCreate: " + "attempt to release a create lock on" + src + " file does not exist."; NameNode.stateChangeLog.warn(message); throw new IOException(message); } //文件已经关闭的情况 if (!iFile.isUnderConstruction()) { final String message = "DIR*NameSystem.internalReleaseCreate: " + "attempt to release a create lock on" +src + " but fileis already closed."; NameNode.stateChangeLog.warn(message); throw new IOException(message); } INodeFileUnderConstruction pendingFile =(INodeFileUnderConstruction) iFile; // 尝试恢复租约,如果文件块数为0,则不能恢复成功,否则重新分配租约 if (pendingFile.getTargets() == null || pendingFile.getTargets().length == 0) { if (pendingFile.getBlocks().length == 0) { //收回租约并打印信息,有兴趣可以看下下面这个函数 finalizeINodeFileUnderConstruction(src,pendingFile); NameNode.stateChangeLog.warn("BLOCK*" + " internalReleaseLease: No blocksfound, lease removed for " + src); return; } // setup the Inode.targets for the lastblock from the blocksMap // Block[] blocks = pendingFile.getBlocks(); Block last = blocks[blocks.length-1]; DatanodeDescriptor[] targets = new DatanodeDescriptor[blocksMap.numNodes(last)]; Iterator<DatanodeDescriptor> it = blocksMap.nodeIterator(last); for (int i = 0; it != null && it.hasNext(); i++) { targets[i] = it.next(); } pendingFile.setTargets(targets); } // 开始真正恢复租约 pendingFile.assignPrimaryDatanode(); Lease reassignedLease = reassignLease( lease, src, HdfsConstants.NN_RECOVERY_LEASEHOLDER, pendingFile); leaseManager.renewLease(reassignedLease); }
重新分配租约其实也很简单,只是更新了下lease的时间戳
private void renew() { this.lastUpdate = FSNamesystem.now(); }