读写分离架构篇HDFS 读写分离(总体架构介绍),我们综述了需要怎么实现读写分离。我们深入了解了社区现在正在实现和完善的方式,就是支持从Standby Namenode(SBN)进行读,而从Active Namenode(ANN)进行写,从而实现HDFS的读写分离,提高NameNode整体的读写性能和吞吐量。
而从本章开始,我们就从源码的角度对其实现进行分析和研究。本篇为源码分析的第一篇,介绍HDFS如何从源码的角度支持从 SBN进行基本的读操作。
引入:ObserverNamenode的概念,也属于SBN,我们先来看看Namenode的启动过程对应的变化:
HdfsServerConstants类中描述了Namenode启动的时候可以有哪些启动参数:
/** Startup options */
enum StartupOption{
FORMAT ("-format"),
CLUSTERID ("-clusterid"),
GENCLUSTERID ("-genclusterid"),
REGULAR ("-regular"),
BACKUP ("-backup"),
CHECKPOINT("-checkpoint"),
UPGRADE ("-upgrade"),
ROLLBACK("-rollback"),
ROLLINGUPGRADE("-rollingUpgrade"),
IMPORT ("-importCheckpoint"),
BOOTSTRAPSTANDBY("-bootstrapStandby"),
INITIALIZESHAREDEDITS("-initializeSharedEdits"),
RECOVER ("-recover"),
FORCE("-force"),
NONINTERACTIVE("-nonInteractive"),
SKIPSHAREDEDITSCHECK("-skipSharedEditsCheck"),
RENAMERESERVED("-renameReserved"),
METADATAVERSION("-metadataVersion"),
UPGRADEONLY("-upgradeOnly"),
// The -hotswap constant should not be used as a startup option, it is
// only used for StorageDirectory.analyzeStorage() in hot swap drive scenario.
// TODO refactor StorageDirectory.analyzeStorage() so that we can do away with
// this in StartupOption.
HOTSWAP("-hotswap"),
// Startup the namenode in observer mode.
OBSERVER("-observer");
其中-observer参数,就是代表Namenode以observer的模式启动,对应的namenode就是ObserverNamenode。
然后我们看namenode的启动过程对应的变化,忽略其他部分,我们只关注读写分离相关的变化(Namenode类):
//表示这个Namenode所属的模式
private volatile HAState state;
//开关,是否开启从SBN进行读取
protected final boolean allowStaleStandbyReads;
//分别对应的HAState
public static final HAState ACTIVE_STATE = new ActiveState();
public static final HAState STANDBY_STATE = new StandbyState();
//observer对应的HAState,和standby一样,只是对应的observer参数为true来区分
public static final HAState OBSERVER_STATE = new StandbyState(true);
public NameNode(Configuration conf) throws IOException {
this(conf, NamenodeRole.NAMENODE);
}
protected NameNode(Configuration conf, NamenodeRole role)
throws IOException {
//参数初始化等操作...
state = createHAState(getStartupOption(conf));
this.allowStaleStandbyReads = HAUtil.shouldAllowStandbyReads(conf);
//根据上下文信息的内容进行启动
state.enterState(haContext);
...
}
protected HAState createHAState(StartupOption startOpt) {
if (!haEnabled || startOpt == StartupOption.UPGRADE
|| startOpt == StartupOption.UPGRADEONLY) {
return ACTIVE_STATE;
} else if (startOpt == StartupOption.OBSERVER) {
return OBSERVER_STATE;
} else {
return STANDBY_STATE;
}
}
// HAUtil.shouldAllowStandbyReads根据dfs.ha.allow.stale.reads参数,判断是否允许从Standby进行读
public static boolean shouldAllowStandbyReads(Configuration conf) {
return conf.getBoolean("dfs.ha.allow.stale.reads", false);
}
接下来我们继续分析关键部分,namenode对应的启动相关的state.enterState(haContext)函数,对应SBN的函数为:StandbyState.enterState(haContext)
//根据observer的的hacontext信息进函数
@Override
public void enterState(HAContext context) throws ServiceFailedException {
try {
context.startStandbyServices();
} catch (IOException e) {
throw new ServiceFailedException("Failed to start standby services", e);
}
}
继续进入context.startStandbyServices() -> NameNode.startStandbyServices():
//对应的state == NameNode.OBSERVER_STATE为true
@Override
public void startStandbyServices() throws IOException {
try {
namesystem.startStandbyServices(getConf(),
state == NameNode.OBSERVER_STATE);
} catch (Throwable t) {
doImmediateShutdown(t);
}
}
然后调用了startStandbyServices(final Configuration conf, boolean isObserver):
/**
* Start services required in standby or observer state
*
* @throws IOException
*/
void startStandbyServices(final Configuration conf, boolean isObserver)
throws IOException {
if (!getFSImage().editLog.isOpenForRead()) {
// During startup, we're already open for read.
getFSImage().editLog.initSharedJournalsForRead();
}
blockManager.setPostponeBlocksFromFuture(true);
// Disable quota checks while in standby.
dir.disableQuotaChecks();
editLogTailer = new EditLogTailer(this, conf);
editLogTailer.start();
//如果是observer则不参与checkpoint
if (!isObserver && standbyShouldCheckpoint) {
standbyCheckpointer = new StandbyCheckpointer(conf, this);
standbyCheckpointer.start();
}
}
我们可以看到observer namenode和传统的SBN只有一个不同,就是不参与checkpoint。