Namenode在启动时,有个重要步骤就是载入fsimage文件,下面分析下这个流程
NameNode.main-> NameNode(conf) -> NameNode.initialize(conf)-> FSNamesystem(this,conf) ->FSNamesystem.initialize(nn, conf)->FSNamesystem.dir.loadFSImage(getNamespaceDirs(conf),getNamespaceEditsDirs(conf),startOpt)
主要看最后一个函数loadFSImage,该函数通过一系列的校验后载入FSImage,在这过程中会合并edits和FSImage,该函数有三个参数,前两个为函数返回值,startOpt为枚举类型,在正常启动时值为REGULAR。下面看getNameSpaceDirs函数。
public static Collection<File> getNamespaceDirs(Configurationconf) { //在配置文件中获得FSImage目录,有可能是多个目录,所以放入集合对象中,该属性在 //hdfs-site.xml中配置 Collection<String> dirNames =conf.getStringCollection("dfs.name.dir"); //判断目录名数量,如果为0,则设置缺省的目录为/tmp/hadoop/dfs/name if (dirNames.isEmpty()) dirNames.add("/tmp/hadoop/dfs/name"); Collection<File> dirs =new ArrayList<File>(dirNames.size()); //将目录名放入数组链表中,最后返回 for(String name : dirNames) { dirs.add(new File(name)); } return dirs; }
//获得edits目录,一般情况下和FSImage相同,但如果系统更新频繁,NAMENODE节点IO太多可以考
//虑将FSImage和edits分开存放,来达到IO负载均衡的效果
public static Collection<File> getNamespaceEditsDirs(Configurationconf) { Collection<String> editsDirNames = conf.getStringCollection("dfs.name.edits.dir"); if (editsDirNames.isEmpty()) editsDirNames.add("/tmp/hadoop/dfs/name"); Collection<File> dirs =new ArrayList<File>(editsDirNames.size()); for(String name : editsDirNames) { dirs.add(new File(name)); } return dirs; }
//两个函数都执行完后则进入载入环节,我们看其中几个重要的函数
// 1、fsImage.recoverTransitionRead
// 2、fsImage.saveNamespace
// 3、fsImage.setCheckpointDirectories
void loadFSImage(Collection<File> dataDirs, Collection<File>editsDirs, StartupOption startOpt)throws IOException { // 如果需要格式化,则先格式化 if (startOpt == StartupOption.FORMAT) { fsImage.setStorageDirectories(dataDirs, editsDirs); fsImage.format(); startOpt = StartupOption.REGULAR; } try { if (fsImage.recoverTransitionRead(dataDirs, editsDirs, startOpt)) { fsImage.saveNamespace(true); } FSEditLog editLog =fsImage.getEditLog(); assert editLog != null :"editLog must be initialized"; if (!editLog.isOpen()) editLog.open(); fsImage.setCheckpointDirectories(null,null); } catch(IOException e) { fsImage.close(); throw e; } synchronized (this) { this.ready =true; this.nameCache.initialized(); this.notifyAll(); } }
fsImage.recoverTransitionRead会判断dfs.name.dir下的目录是否正常,通过analyzeStorage函数来判断,正常启动的情况下代码如下:
1、 首先判断文件的存在性,比如文件是否在,目录中是否包含临时目录等,主要代码如下:
analyzeStorage函数:
// 判断版本文件有效性 File versionFile = getVersionFile(); boolean hasCurrent = versionFile.exists(); // 一系列的临时目录检测 boolean hasPrevious = getPreviousDir().exists(); boolean hasPreviousTmp = getPreviousTmp().exists(); boolean hasRemovedTmp = getRemovedTmp().exists(); boolean hasFinalizedTmp = getFinalizedTmp().exists(); boolean hasCheckpointTmp =getLastCheckpointTmp().exists(); //正常情况下会返回NORMAL,判断条件就是这些临时目录没有 if (!(hasPreviousTmp || hasRemovedTmp || hasFinalizedTmp ||hasCheckpointTmp)) { // no temp dirs - no recovery if (hasCurrent) return StorageState.NORMAL; if (hasPrevious) throw new InconsistentFSStateException(root, "version file in current directory is missing."); return StorageState.NOT_FORMATTED; }
2、 版本文件校验sd.read() -> read(getVersionFile() ->getFields()
//读入VERSIONS文件,转化为Properties类型,进行一致性校验,校验通过则把用这些
//属性信息初始化父类StorageInfo成员变量
protected void getFields(Propertiesprops, StorageDirectory sd )throws IOException { String sv, st, sid, sct; sv = props.getProperty("layoutVersion"); st = props.getProperty("storageType"); sid = props.getProperty("namespaceID"); sct = props.getProperty("cTime"); //属性校验开始 if (sv == null || st ==null || sid == null || sct ==null) throw new InconsistentFSStateException(sd.root, "file " + STORAGE_FILE_VERSION + " is invalid."); int rv = Integer.parseInt(sv); NodeType rt = NodeType.valueOf(st); int rid = Integer.parseInt(sid); long rct = Long.parseLong(sct); if (!storageType.equals(rt) || !((namespaceID == 0) || (rid == 0) ||namespaceID == rid)) throw new InconsistentFSStateException(sd.root, "is incompatible with others."); if (rv < FSConstants.LAYOUT_VERSION)// future version throw new IncorrectVersionException(rv,"storage directory " + sd.root.getCanonicalPath()); //StorageInfo成员变量初始化 layoutVersion = rv; storageType = rt; namespaceID = rid; cTime = rct; }
3、 再次循环dfs.name.dir目录,判断是否有需要格式化的
for (Iterator<StorageDirectory> it = dirIterator();it.hasNext();) { StorageDirectory sd = it.next(); StorageState curState =dataDirStates.get(sd); switch(curState) { caseNON_EXISTENT: assertfalse : StorageState.NON_EXISTENT +" state cannot be here"; case NOT_FORMATTED: LOG.info("Storage directory " + sd.getRoot() + " is not formatted."); LOG.info("Formatting ..."); sd.clearDirectory();// create empty currrent dir break; default: break; } }
4、 判断启动参数 升级?引入?回滚?常规?在这里,因为是正常启动所以执行的是载入FSImage
loadFSImage()
boolean loadFSImage()throws IOException{
// Nowcheck all curFiles and see which is the newest
longlatestNameCheckpointTime = Long.MIN_VALUE;
long latestEditsCheckpointTime= Long.MIN_VALUE;
StorageDirectory latestNameSD =null;
StorageDirectory latestEditsSD =null;
boolean needToSave= false;
isUpgradeFinalized = true;
Collection<String> imageDirs =new ArrayList<String>();
Collection<String> editsDirs =new ArrayList<String>();
//循环dfs.name.dir所指定的目录,并把有效目录加入集合变量,并读取fstime来确定检查点时间,
//如果有多个目录则以最新的检查点时间为准,因为在这个循环中latestNameCheckpointTime会
//保留最新的时间戳
for(Iterator<StorageDirectory> it = dirIterator(); it.hasNext();) {
StorageDirectory sd = it.next();
if(!sd.getVersionFile().exists()) {
needToSave |=true;
continue; // some of them might have just beenformatted
}
boolean imageExists= false, editsExists =false;
if(sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE)) {
imageExists =getImageFile(sd,NameNodeFile.IMAGE).exists();
imageDirs.add(sd.getRoot().getCanonicalPath());
}
if(sd.getStorageDirType().isOfType(NameNodeDirType.EDITS)) {
editsExists =getImageFile(sd,NameNodeFile.EDITS).exists();
editsDirs.add(sd.getRoot().getCanonicalPath());
}
checkpointTime = readCheckpointTime(sd);
if ((checkpointTime != Long.MIN_VALUE) &&
((checkpointTime !=latestNameCheckpointTime) ||
(checkpointTime !=latestEditsCheckpointTime))){
// Force saving of new image if checkpoint time
// is not same in all of the storage directories.
needToSave |=true;
}
//确定有效的检查点时间
if(sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE) &&
(latestNameCheckpointTime <checkpointTime) && imageExists) {
latestNameCheckpointTime =checkpointTime;
latestNameSD = sd;
}
if(sd.getStorageDirType().isOfType(NameNodeDirType.EDITS) &&
(latestEditsCheckpointTime <checkpointTime) && editsExists) {
latestEditsCheckpointTime =checkpointTime;
latestEditsSD = sd;
}
if (checkpointTime <= 0L)
needToSave |=true;
// setfinalized flag
isUpgradeFinalized = isUpgradeFinalized && !sd.getPreviousDir().exists();
}
// Weshould have at least one image and one edits dirs
if(latestNameSD == null)
throw new IOException("Imagefile is not found in " +imageDirs);
if(latestEditsSD == null)
throw new IOException("Editsfile is not found in " +editsDirs);
// 如果image的检查点时间大于edits的检查点时间,则是image的时间为准
if(latestNameCheckpointTime > latestEditsCheckpointTime
&& latestNameSD !=latestEditsSD
&& latestNameSD.getStorageDirType()== NameNodeDirType.IMAGE
&&latestEditsSD.getStorageDirType() == NameNodeDirType.EDITS) {
// This isa rare failure when NN has image-only and edits-only
// storagedirectories, and fails right after saving images,
// in someof the storage directories, but before purging edits.
// See-NOTE- in saveNamespace().
LOG.error("This is a rare failurescenario!!!");
LOG.error("Image checkpoint time " + latestNameCheckpointTime +
" > edits checkpoint time " + latestEditsCheckpointTime);
LOG.error("Name-node will treat the image as thelatest state of " +
"the namespace. Old edits will be discarded.");
} else if (latestNameCheckpointTime !=latestEditsCheckpointTime)
throw new IOException("Inconsistentstorage detected, " +
"image and edits checkpoint times do not match." +
"image checkpoint time = " + latestNameCheckpointTime +
"edits checkpoint time = " + latestEditsCheckpointTime);
// Recoverfrom previous interrrupted checkpoint if any
needToSave |=recoverInterruptedCheckpoint(latestNameSD, latestEditsSD);
long startTime =FSNamesystem.now();
long imageSize =getImageFile(latestNameSD, NameNodeFile.IMAGE).length();
//
// Load inbits
//
latestNameSD.read();//这里还要载入一次VERIONS文件,真TM麻烦
needToSave |= loadFSImage(getImageFile(latestNameSD,NameNodeFile.IMAGE));//注意:这里才真正开始载入fsimage文件
LOG.info("Image file of size " + imageSize +" loaded in "
+ (FSNamesystem.now() -startTime)/1000 +"seconds.");
// Loadlatest edits
if(latestNameCheckpointTime > latestEditsCheckpointTime)
// theimage is already current, discard edits
needToSave |=true;
else // latestNameCheckpointTime ==latestEditsCheckpointTime
needToSave |= (loadFSEdits(latestEditsSD)> 0);
return needToSave;
}
经过漫长的校验之后在
needToSave|= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));才开始真正载入fsimage文件,
需要注意的是在开篇调用流程中有个FSNamesystem.dir.loadFSImage,这个函数的调用代码在FSDirectory.java中,真正载入fsimage的代码在FSImage.java中,不要混淆,具体流程下回分解。