这篇文章写的不错 http://blog.csdn.net/jackydai987/article/details/6227365
总结下主要流程:
1. JobClient.runJob()
根据用户设置的InputFormat类将输入数据进行切分,将相应的信息放在job.jar,job.split和job.xml这三个文件中并存入HDFS.
2.:JobTracker.submitJob()
创建新的JIP对象,其初始化时会将job.jar,job.split和job.xml这三个文件存放在本地文件系统的临时目录中。经过监听器等的一系列操作,JT.JobInitThread 最终调用该JIP的initTasks()函数进行初始化。
3. initTasks()
这里有四个关键数据结构:
// NetworkTopology Node to the set of TIPs
Map<Node, List<TaskInProgress>> nonRunningMapCache;
// Map of NetworkTopology Node to set of running TIPs
Map<Node, Set<TaskInProgress>> runningMapCache;
// A list of non-local, non-running maps
final List<TaskInProgress> nonLocalMaps;
// A set of non-local running maps
Set<TaskInProgress> nonLocalRunningMaps;
首先根据获得的split数目创建相应的TIP对象, 并通过createCache初始化nonRunningMapCache来建立hostMap,即节点和任务之间的关系。至于不具备数据本地性任务的TIP, 放入nonLoalMaps中.
建立关系时,主要按照由近及远,由本节点到rackNode的流程。将该任务分别加入该node节点和其父节点(rack node), 由于maxlevel默认为2,因此一般nonRunningMapCache中存放的就是节点本地性任务和机架本地性任务。
private Map<Node, List<TaskInProgress>> createCache(
TaskSplitMetaInfo[] splits, int maxLevel)
throws UnknownHostException {
Map<Node, List<TaskInProgress>> cache =
new IdentityHashMap<Node, List<TaskInProgress>>(maxLevel);
Set<String> uniqueHosts = new TreeSet<String>();
for (int i = 0; i < splits.length; i++) {
String[] splitLocations = splits[i].getLocations();
if (splitLocations == null || splitLocations.length == 0) {
nonLocalMaps.add(maps[i]);
continue;
}
for(String host: splitLocations) {
Node node = jobtracker.resolveAndAddToTopology(host);
uniqueHosts.add(host);
LOG.info("tip:" + maps[i].getTIPId() + " has split on node:" + node);
for (int j = 0; j < maxLevel; j++) {
List<TaskInProgress> hostMaps = cache.get(node);
if (hostMaps == null) {
hostMaps = new ArrayList<TaskInProgress>();
cache.put(node, hostMaps);
hostMaps.add(maps[i]);
}
//check whether the hostMaps already contains an entry for a TIP
//This will be true for nodes that are racks and multiple nodes in
//the rack contain the input for a tip. Note that if it already
//exists in the hostMaps, it must be the last element there since
//we process one TIP at a time sequentially in the split-size order
if (hostMaps.get(hostMaps.size() - 1) != maps[i]) {
hostMaps.add(maps[i]);
}
node = node.getParent();
}
}
}
// Calibrate the localityWaitFactor - Do not override user intent!
if (localityWaitFactor == DEFAULT_LOCALITY_WAIT_FACTOR) {
int jobNodes = uniqueHosts.size();
int clusterNodes = jobtracker.getNumberOfUniqueHosts();
if (clusterNodes > 0) {
localityWaitFactor =
Math.min((float)jobNodes/clusterNodes, localityWaitFactor);
}
LOG.info(jobId + " LOCALITY_WAIT_FACTOR=" + localityWaitFactor);
}
return cache;
}
4. 任务分配时
本地性任务的区分靠findNewMapTask的参数maxLevel来区分,maxLevel=1时调度Node Local, 2时NodeOrRackLocal,
主要分配函数为scheduleMap, 其主要作用是将从nonRunningMapCache中找到的符合条件的TIP(本地或者是非本地),取出放入runningMapCache中,即可。
至于不具备数据本地性任务的TIP,同理从nonLocalMaps中找到相应任务, 放入nonLocalRunningMaps中。
protected synchronized void scheduleMap(TaskInProgress tip) {
if (runningMapCache == null) {
LOG.warn("Running cache for maps is missing!! "
+ "Job details are missing.");
return;
}
String[] splitLocations = tip.getSplitLocations();
// Add the TIP to the list of non-local running TIPs
if (splitLocations == null || splitLocations.length == 0) {
nonLocalRunningMaps.add(tip);
return;
}
for(String host: splitLocations) {
Node node = jobtracker.getNode(host);
for (int j = 0; j < maxLevel; ++j) {
Set<TaskInProgress> hostMaps = runningMapCache.get(node);
if (hostMaps == null) {
// create a cache if needed
hostMaps = new LinkedHashSet<TaskInProgress>();
runningMapCache.put(node, hostMaps);
}
hostMaps.add(tip);
node = node.getParent();
}
}
}
以上是任务的创建和本地性分配,非本地性任务的分配流程有时间在描述。