Algorithms Part 1-Question 4- SCC 强联通问题

Algorithms: Design and Analysis, Part 1 


  本次作业是算法课程中最难的一次。我想,除了算法之外,还牵涉到实现的问题。因为很多编程语言都无法处理极大次数的递归调用。

  题目说明

Download the text file here. Zipped version here. (Right click and save link as)

The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Every row indicates an edge, the vertex label in first column is the tail and the vertex label in second column is the head (recall the graph is directed, and the edges are directed from the first column vertex to the second column vertex). So for example, the  11th  row looks liks : "2 47646". This just means that the vertex with label 2 has an outgoing edge to the vertex with label 47646


Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph. 


Output Format: You should output the sizes of the 5 largest SCCs in the given graph, in decreasing order of sizes, separated by commas (avoid any spaces). So if your algorithm computes the sizes of the five largest SCCs to be 500, 400, 300, 200 and 100, then your answer should be "500,400,300,200,100". If your algorithm finds less than 5 SCCs, then write 0 for the remaining terms. Thus, if your algorithm computes only 3 SCCs whose sizes are 400, 300, and 100, then your answer should be "400,300,100,0,0".


WARNING: This is the most challenging programming assignment of the course. Because of the size of the graph you may have to manage memory carefully. The best way to do this depends on your programming language and environment, and we strongly suggest that you exchange tips for doing this on the discussion forums.

 算法实现

  算法的实现较为简单,分为三步,第一步求图的转置,第二步DFS遍历转置后的图,得到拓扑序列,第三步使用这个拓扑序列对图进行DFS,每次DFS得到的点即为一个SCC/强联通集合。

  初步使用python实现的代码如下:

def firstdfs(vertexind):
    global fs,isexplored,visitordered,mapDictT
    if len(mapDictT[vertexind])>0:
        for ind in mapDictT[vertexind]:
            if not isexplored[ind-1]:
                isexplored[ind-1]=True
                firstdfs(ind)
    visitordered[fs-1]=vertexind
    #print(str(vertexind)+' fs: '+str(fs))
    fs=fs-1

def seconddfs(vertexind):
    global s,secisexplored,header,mapDict
    if len(mapDict[vertexind])==0:return
    for ind in mapDict[vertexind]:
        if not secisexplored[ind-1]:
            secisexplored[ind-1]=True
            seconddfs(ind)
    header[s-1]+=1


maplength=875714
#maplength=8
f=open('SCC.txt','r')
mapDict={x:[] for x in range(1,maplength+1)}
mapDictT={x:[] for x in range(1,maplength+1)}
for line in f.readlines():
    tmp=[int(x) for x in line.split()]
    mapDict[tmp[0]].append(tmp[1])
    mapDictT[tmp[1]].append(tmp[0])
f.close

fs=maplength
isexplored=[False for x in range(1,maplength+1)]
secisexplored=[False for x in range(1,maplength+1)]
visitordered=[0 for x in range(1,maplength+1)]
header=[0 for x in range(1,maplength+1)]

for ind in range(1,maplength+1):
    if not isexplored[ind-1]:
        #print('Begin from: '+str(ind))
        isexplored[ind-1]=True
        firstdfs(ind)
print('Second DFS')
for ind in visitordered:
    if not secisexplored[ind-1]:
        s=ind
        secisexplored[ind-1]=True
        seconddfs(ind)

header.sort(reverse=True)
print(header[0:20])
  用来测试的图存储在文本文件中,测试用文件内容如下:
1 2
2 6
2 3
2 4
3 1
3 4
4 5
5 4
6 5
6 7
7 6
7 8
8 5
8 7

  注意,maplength测试时要改成8。输出的前五个应该是
3,3,2,0,0

  Python的迭代次数限制

  Python有默认的函数迭代次数限制,默认一般不超过1000,如果超过此次数会造成栈溢出错误。使用下列代码可以更改默认迭代限制并显示

import sys
sys.setrecursionlimit(80000000)
print(sys.getrecursionlimit())

  使用下列代码可以测试实际能达到的迭代次数
def f(d):
	if d%500==0:print(d)
	f(d+1)
f*(1)
  使用上述代码测试,更改限制后,win8系统下面python3.3内存8G最大迭代次数4000左右,debian6系统下python3.2内存16G最大迭代次数26000左右,内存均未耗尽。尽管次数限制放宽,但由于某些原因还是受到限制。这种情况下不会报告栈溢出错误,但程序同样会crash。

  问题在哪里呢?到论坛看了一下别人的讨论,才明白还有可能栈size不够。

  完善版本的程序

  把栈size设置为64M之后就ok了。

  完整代码如下:

import sys,threading
sys.setrecursionlimit(3000000)
threading.stack_size(67108864)

def firstdfs(vertexind):
    global fs,isexplored,visitordered,mapDictT
    if len(mapDictT[vertexind])>0:
        for ind in mapDictT[vertexind]:
            if not isexplored[ind-1]:
                isexplored[ind-1]=True
                firstdfs(ind)
    visitordered[fs-1]=vertexind
    #print(str(vertexind)+' fs: '+str(fs))
    fs=fs-1

def seconddfs(vertexind):
    global s,secisexplored,header,mapDict
    if len(mapDict[vertexind])==0:return
    for ind in mapDict[vertexind]:
        if not secisexplored[ind-1]:
            secisexplored[ind-1]=True
            seconddfs(ind)
    header[s-1]+=1

def sccmain():
    global mapDict,mapDictT,fs,isexplored,visitordered,s,secisexplored,header
    maplength=875714
    #maplength=11
    f=open('SCC.txt','r')
    mapDict={x:[] for x in range(1,maplength+1)}
    mapDictT={x:[] for x in range(1,maplength+1)}
    for line in f.readlines():
        tmp=[int(x) for x in line.split()]
        mapDict[tmp[0]].append(tmp[1])
        mapDictT[tmp[1]].append(tmp[0])
    f.close

    fs=maplength
    isexplored=[False for x in range(1,maplength+1)]
    secisexplored=[False for x in range(1,maplength+1)]
    visitordered=[0 for x in range(1,maplength+1)]
    header=[0 for x in range(1,maplength+1)]

    for ind in range(1,maplength+1):
        if not isexplored[ind-1]:
            #print('Begin from: '+str(ind))
            isexplored[ind-1]=True
            firstdfs(ind)
    print('Second DFS')
    for ind in visitordered:
        if not secisexplored[ind-1]:
            s=ind
            secisexplored[ind-1]=True
            seconddfs(ind)

    header.sort(reverse=True)
    print(header[0:20])

if __name__ =='__main__':
    thread=threading.Thread(target=sccmain)
    thread.start()


  

你可能感兴趣的:(Algorithms Part 1-Question 4- SCC 强联通问题)