在nm启动container的过程中,有一个步骤是把当前的tokens写入本地目录,默认情况下具体的调用的方法是在DefaultContainerExecutor类的startLocalizer 方法中:
public synchronized void startLocalizer (Path nmPrivateContainerTokensPath, InetSocketAddress nmAddr, String user, String appId, String locId, List<String> localDirs, List<String> logDirs) throws IOException, InterruptedException { ContainerLocalizer localizer = new ContainerLocalizer( lfs, user, appId, locId, getPaths(localDirs), RecordFactoryProvider.getRecordFactory(getConf())); createUserLocalDirs(localDirs, user); //Initialize the local directories for a particular user,create $local.dir/usercache/$user and its immediate parent createUserCacheDirs(localDirs, user); //Initialize the local cache directories for a particular user.$local.dir/usercache/$user,$local.dir/usercache/$user/appcache,$local.dir/usercache/$user/filecache createAppDirs(localDirs, user, appId); //Initialize the local directories for a particular user.$local.dir/usercache/$user/appcache/$appi createAppLogDirs(appId, logDirs); //Create application log directories on all disks.create $log.dir/$appid // TODO : Why pick first app dir. The same in LCE why not random? Path appStorageDir = getFirstApplicationDir (localDirs, user, appId); String tokenFn = String.format(ContainerLocalizer.TOKEN_FILE_NAME_FMT, locId); Path tokenDst = new Path (appStorageDir, tokenFn); lfs.util().copy(nmPrivateContainerTokensPath, tokenDst); LOG.info( "Copying from " + nmPrivateContainerTokensPath + " to " + tokenDst); lfs.setWorkingDirectory(appStorageDir); LOG.info( "CWD set to " + appStorageDir + " = " + lfs.getWorkingDirectory()); // TODO : DO it over RPC for maintaining similarity? localizer.runLocalization(nmAddr); }
主要注意 getFirstApplicationDir (localDirs, user, appId)这一段,先生成token文件的名称,然后调用copy的操作把具体的token文件cp到yarn的本地工作目录。
这里getFirstApplicationDir 方法,传入的第一个参数是yarn写临时数据的目录,和
yarn.nodemanager.local-dirs(List of directories to store localized files in.)
相关
private Path getFirstApplicationDir (List<String> localDirs, String user, String appId) { return getApplicationDir( new Path(localDirs.get(0)), user, appId); }
而这里使用了localDirs.get(0),再来看下localDirs的生成:
localDirs的获取定义在ResourceLocalizationService内部类LocalizerRunner类的run方法中:
private LocalDirsHandlerService dirsHandler; .... List<String> localDirs = dirsHandler.getLocalDirs(); List<String> logDirs = dirsHandler.getLogDirs();
调用LocalDirsHandlerService 类:
/** Local dirs to store localized files in */ private DirectoryCollection localDirs = null; /** storage for container logs*/ private DirectoryCollection logDirs = null; localDirs = new DirectoryCollection( validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOCAL_DIRS))); logDirs = new DirectoryCollection( validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOG_DIRS)));
这里localDirs 是通过解析yarn.nodemanager.local-dirs配置项的值获取的,因为配置项是一定的,这就导致得出的localDirs 一直是同一个List,从而导致写入token的目录一直是同一个目录,这其实是一个bug:
https://issues.apache.org/jira/browse/YARN-2566
导致在写入token文件时,所有的container的token都会写到同一个目录,解决的方法其实是使用了随机数的方式,具体可以看patch.