GPS-Graph Processing System Graph Coloring算法分析 (三)

    HamaWhite 原创,转载请注明出处!欢迎大家加入Giraph 技术交流群: 228591158

    Graph coloring is the problem of assigning a color to each vertex of an undirected graph such that no two adjacent vertices have the same color. We implement the greedy algorithm from Scalable parallel graph coloring algorithms. The algorithm iteratively finds a maximal independent set (MIS) of vertices, i.e., a maximal set of vertices such that no pair of vertices are adjacent. The algorithm assigns the vertices in each MIS a new
color, then removes them from the graph, until there are no vertices left in the graph.

注:Maximal Independent Set参考 《找最大独立集问题-Finding a Maximal Independent Set》

1. 顶点状态分为五种:COLORED、IN_SET、NOT_IN_SET、SELECTED_AS_POSSIBLE_IN_SET 和 UNDECIDED。

COLORED:表示已染色;

IN_SET:加入到MIS中;

NOT_IN_SET:至少存在一条边指向 IN_SET 中的顶点。

SELECTED_AS_POSSIBLE_IN_SET:有可能(尝试性)地加入到 IN_SET 中,根据概率来选择。

UNDECIDED:不存在指向 IN_SET 中任意顶点的边,且不在IN_SET中。

2. 消息分为三种类型: NeighborSelectedAsPossibleMessage、RemoveNeighborMessage 和 DecrementNumNeighborsMessage。

3. 顶点的Value有三部分:1) color:存储顶点的颜色,初始为null;2)type:存储顶点的状态;3)numRemainingNeighbors:剩余的出度大小,初始为顶点的出度大小

4.  计算分为五个阶段:MIS_1、MIS_2、MIS_3、MIS_4 和 COLORING。

MIS_1 :选择阶段,只有UNDECIDED 状态的顶点参与。每个UNDECIDED 状态的顶点v按照1/(2* degree(v))的概率进入SELECTED_AS_POSSIBLE_IN_SET状态。若顶点的numRemainingNeighbors大于0,则向邻接顶点发送 NeighborSelectedAsPossibleMessage类型的消息,该消息中包含顶点ID。概率选择具体实现:根据每个顶点剩余的出度计算出概率,然后产生一随机数。该随机数若小于等于概率,则向邻接顶点发送消息。源码如下:

// NOT_IN_SET和IN_SET状态的顶点不参与计算,直接返回
if (ColoringVertexType.NOT_IN_SET == value.type
	|| ColoringVertexType.IN_SET == value.type) {
	return;
}
//计算顶点的加入概率
double probability = getNeighborsSize() > 0 ? 1.0 /
    ((double) 2*value.numRemainingNeighbors) : 1;
//生成的随机数若小于概率,则进入SELECTED_AS_POSSIBLE_IN_SET状态
if (Math.random() <= probability) {
	value.type = ColoringVertexType.SELECTED_AS_POSSIBLE_IN_SET;
	//后面的MIS_3阶段会减少numRemainingNeighbors的值,此处判断主要是为了减少不必要的消息发送。
	//假如顶点u的出度只有v和w,顶点u的numRemainingNeighbors初始值为2.当v和w都进入NOT_IN_SET
	//状态时,会向顶点u发送DecrementNumNeighborsMessage消息,在MIS_3阶段会把
	//numRemainingNeighbors减为0,此时顶点u就不必向v和w发送消息.
	if (value.numRemainingNeighbors > 0) {
	ColoringMessage newSelectedAsPossibleMessage = ColoringMessage
			.newNeighborSelectedAsPossibleMessage(getId());
		for (int neighborId : getNeighborIds()) {
			//因后面删除顶点时,只是把neighborId置为-1,并未真正删除.后面发送消息均有判断.
			if (neighborId >= 0) {
				sendMessage(neighborId, newSelectedAsPossibleMessage);
			}
		}
	}
}

MIS_2 :冲突解决阶段,只有SELECTED_AS_POSSIBLE_IN_SET状态的顶点参与。顶点接受MIS_1阶段发送的NeighborSelectedAsPossibleMessage消息,若顶点 v 收到的顶点ID都比顶点v自身大,则顶点v进入IN_SET状态,然后向邻接顶点发送removeNeighborMessage 消息(包含顶点ID),告诉邻接顶点删除指向该顶点的边;否则,v 重新置为 UNDECIDED状态。代码如下:

//只有SELECTED_AS_POSSIBLE_IN_SET状态的顶点参与
if (ColoringVertexType.SELECTED_AS_POSSIBLE_IN_SET != value.type) {
	break;
}
for (ColoringMessage message : messageValues) {
	//收到的消息值若比自身的VertexId小,则置为UNDECIDED状态。
	if (message.int1 < getId()) {
		value.type = ColoringVertexType.UNDECIDED;
		//numEdgesOfPotentiallyActiveVertices是为了判断找一个MIS
		//是否结束(由Master全局判断)
		numEdgesOfPotentiallyActiveVertices.value += getNeighborsSize();
		return;
	}
}
//收到的顶点ID都比自身大,则该顶点进入IN_SET状态,然后向邻接顶点
//发送removeNeighborMessage消息(消息包含顶点ID)
setValueToInSetAndNotifyNeighbors();

private void setValueToInSetAndNotifyNeighbors() {
	for (int neighborId : getNeighborIds()) {
		if (neighborId >= 0) {
			sendMessage(neighborId, ColoringMessage.removeNeighborMessage(getId()));
		}
	}
	value.type = ColoringVertexType.IN_SET;
}

MIS_3 :NOT_IN_SET发现和度数调整-1阶段。IN_SET状态的顶点不参与,UNDECIDED状态和NOT_IN_SET状态的顶点收到RemoveNeighborMessage 消息后删除指向IN_SET状态顶点的边。UNDECIDED状态的顶点进入NOT_IN_SET状态,然后向邻接顶点发送DecrementNumNeighborsMessage消息(不包含顶点ID)。

//IN_SET状态的顶点不参与
if (ColoringVertexType.IN_SET == value.type) {
	return;
}
//删除指向IN_SET顶点的边,并把UNDECIDED状态的顶点置为NOT_IN_SET
removeNeighborsAndPossiblySetTypeToNotInSet(messageValues);

private void removeNeighborsAndPossiblySetTypeToNotInSet(Iterable<ColoringMessage> messageValues) {
	if (messageValues.iterator().hasNext()) {
		//NOT_IN_SET和UNDECIDED状态的顶点删除指向IN_SET顶点的边
		removeNeighbors(messageValues);
		
		//把UNDECIDED状态的顶点置为NOT_IN_SET状态,并向邻接顶点发送DecrementNumNeighborsMessage
		if (value.type == ColoringVertexType.UNDECIDED) {
			value.type = ColoringVertexType.NOT_IN_SET;
			for (int neighborId : getNeighborIds()) {
				if (neighborId >= 0) {
					sendMessage(neighborId,
						ColoringMessage.newDecrementNumNeighborsMessage());
				}
			}
		}
	}
}

private void removeNeighbors(Iterable<ColoringMessage> messageValues) {
	removedNeighbors.clear();
	for (ColoringMessage message : messageValues) {
		removedNeighbors.add(message.int1);
	}
	int numNeighborsRemoved = 0;
	int neighborIdIndex = 0;
	for (int neighborId : getNeighborIds()) {
		if (neighborId >= 0 && removedNeighbors.contains(neighborId)) {
			numNeighborsRemoved++;
			//并未真正删除,只是把邻接顶点的值修改为-1
			relabelIdOfNeighbor(neighborIdIndex, -1);
		}
		neighborIdIndex++;
	}
	//减少numRemainingNeighbors的值
	value.numRemainingNeighbors -= numNeighborsRemoved;
}

MIS_4 :度数调整-2阶段,只有UNDECIDED 状态的顶点参与。每个UNDECIDED 状态的顶点把自己的numRemainingNeighbors减少收到的消息数目。如果还有UNDECIDED顶点,Master会告诉有Worker进入MIS_1 阶段,进行迭代;否则,一个MIS构造完成,Master通知所有Worker进入COLORING 阶段。

思考:进入IN_SET状态的顶点,会发送removeNeighborMessage 消息给邻接顶点,邻接顶点不管是NOT_IN_SET还是UNDECIDED状态,都会删除指向IN_SET顶点的边,即IN_SET顶点已处理完,无需与外界联系。某顶点进入NOT_IN_SET状态时,会告诉自己邻接的UNDECIDED顶点减少自己的 numRemainingNeighbors值(而不是删除),在下次迭代的MIS_1阶段,若顶点的numRemainingNeighbors小于0,说面其邻接顶点都进入NOT_IN_SET状态,无需向邻接顶点发送消息,减少了消息传递量。

if (ColoringVertexType.IN_SET == value.type
	|| ColoringVertexType.NOT_IN_SET == value.type) {
	return;
}
	
for (ColoringMessage message : messageValues) {
	value.numRemainingNeighbors--;
}

COLORING :染色阶段。进入此阶段已没有UNDECIDED状态的顶点,故只有IN_SET和NOT_IN_SET状态的顶点参与。IN_SET状态的顶点染色,变成Inactive状态。把所有NOT_IN_SET状态的顶点置为UNDECIDED状态,并重新计算numRemainingNeighbors值。

if (value.type == ColoringVertexType.IN_SET) {
	value.color = latestColor;
	value.type = ColoringVertexType.COLORED;
	removeEdges();
	voteToHalt();
} else {
	value.type = ColoringVertexType.UNDECIDED;
	//GOBJ_NUM_NOT_COLORED_VERTICES用于判断染色是否结束
	getGlobalObjectsMap().putOrUpdateGlobalObject(ColoringOptions.GOBJ_NUM_NOT_COLORED_VERTICES, 
		new IntSumGlobalObject(1));
	getGlobalObjectsMap().putOrUpdateGlobalObject(
		ColoringOptions.GOBJ_NUM_EDGES_OF_NOT_COLORED_VERTICES,
			new IntSumGlobalObject(value.numRemainingNeighbors));
	value.numRemainingNeighbors = countNumRemainingNeighbors();
}

完!

你可能感兴趣的:(Graph,Graph,processing,SYS,coloring)