图-最小生成树, 2022-10-30

(2022.10.30 Sun)
生成树(Minimal Spanning Tree,MST)的概念针对连通图而提出。

概念和定理

  • 连通图(connected graph):无向图(undirected graph)中,如果任意两点有路径连接,则称其为连通图(connected graph)
  • 强连通图:在有向图(directed graph)中,任意两点,都有路径相连接,则称其为强连通图
  • 生成树:连通图的生成树指一个连通子图,含有图中的全部n个顶点,只有足以构成一棵树的n-1条边。在该树中再添加一条边,则必定成环
  • 最小生成树:设连通图的每个边有权重,在一个连通图的所有生成树中,权重之和最小的那课树定义为最小生成树

在无向图中查找MST,可通过greedy method实现,本文后面将要介绍的两种方法都是基于该方案。在介绍算法之前,先了解关于MST的一个重要事实。

Let G be a weighted connected graph, and let V1 and V2 be a
partition of the vertices of G into two disjoint nonempty sets. Furthermore, let e be an edge in G with minimum weight from among those with one endpoint in V1 and the other in V2. There is a minimum spanning tree T that has e as one of its edges.

加权连通图G,其中V1和V2分别是G的两个点集合,且V1和V2不连接。设边e是V1和V2之间的任意点连接的边中权重最小的边,则e是该图MST的一边。

证明:
令T是图G的最小生成树。如果T不包含边e,则在T中加入e会形成一个环(cycle)。因此在T中有一条边满足,且f的两个顶点分别位于V1和V2。考虑到权重值e最小,则有。如果从中移除f,就得到了权重比之前小的一棵MST。考虑到T是MST,则替换f为e的新树也必然为MST。

Let T be a minimum spanning tree of G. If T does not contain
edge e, the addition of e to T must create a cycle. Therefore, there is some edge of this cycle that has one endpoint in V1 and the other in V2. Moreover, by the choice of e, . If we remove f from , we obtain a spanning tree whose total weight is no more than before. Since T was a minimum spanning tree, this new tree must also be a minimum spanning tree.

Kruskal算法

Kruskal算法在最初将每个独立顶点当做一棵树,并重复将不在一棵树上的顶点融合成一个簇,最终该簇成为MST。

该算法对边做排序,根据权重升序(ascending)排序,从排序中依次考虑各边。如果一条选中的边连接了两颗树,则将该边加入到MST中,且这两颗树融合为一个簇。两个不同的簇可融合为一个簇。如果按排序选中的边e所连接的两个树/簇已经在同一个簇中,则抛弃该边e。当边的个数足够多,达到n-1,则过程终止,得到的簇即MST。

Complexity analysis

该算法的计算分为两部分:edge sorting和test whether 2 clusters are distinct。

边排序部分采用快速等方法,复杂度为,其中m表示边的个数。对于一般图来说,m为(?),因此等效于,所以排序部分的复杂度为。

第二部分,placeholder。

总复杂度是。

Codes

import logging

class Edge:
    """
    edge structure, two ends, i.e., u and v, and weight incoporated
    """
    def __init__(self, u, v, weight):
        self.u = u
        self.v = v
        self.weight = weight

class DisjointSet:
    def __init__(self, n):
        """
        n: number of vertices
        """
        self.parent = [None] * n # parenet of node i
        self.size = [1] * n # for merge
        for i in range(n):
            self.parent[i] = i # set each vertex as its only parent

    def merge_set(self, a, b):
        """
        merge two vertices, a and b, and the trees they belong to

        Arguments:
          a - index of a
          b - index of b
        Returns:
          if a and b share no the same root, then merge them
        """
        a = self.find_set(a)
        b = self.find_set(b)
        logging.info(f"""merging: vertex {a+1} and {b+1}""")

        if self.size[a] < self.size[b]:
            logging.info(f"""merging: size.{a+1} < size.{b+1}""")
            self.parent[a] = b # merge a to b
            self.size[b] += self.size[a] # add size of old set(a) to set(b)
        else:
            logging.info(f"""merging: size.{a+1} >= size.{b+1}""")
            self.parent[b] = a # merge b to a
            self.size[a] += self.size[b] # add size of old set(b) to set(a)

    def find_set(self, a):
        """
        find out root of node a, or the set which names after root
        """
        if self.parent[a] != a: 
            # find it out in a recurrsive way 
            self.parent[a] = self.find_set(self.parent[a])
        # return root
        return self.parent[a]


def k_algo(n, edges, ds):
    """
    Kruskal algorithm: 
      Rank all edges in an ascendent order of weights, find out top n-1 edges with no loop.
      Select a new edge from edge ranking, if the two ends of that edge are not in the disjoint set,
      then add the edge to MST. Otherwise, adding the edge leads to a loop. The final results contain 
      n-1 edges.

    Arguments:
      n - int, #vertices
      edges - list, graph edges
      ds - DisjointSet
    Returns:
      sum(weights) - MST weight summary
    """

    edges.sort(key=lambda x: x.weight) # sort edges by weights
    MST = [] 
    for edge in edges:
        set_u = ds.find_set(edge.u) # root of u 
        set_v = ds.find_set(edge.v) 
        if set_u != set_v: 
            logging.info(f"""Vertices {u+1} and {v+1} have different roots.""")
            # u and v have no same root, then merge
            ds.merge_set(set_u, set_v)
            MST.append(edge)
            if len(MST) == n-1: 
                # reach to the minimal number of edge, simply stop
                logging.info(f"""MST reaches to the end.""")
                break
    edges_selected = []
    for e in MST:
        edges_selected.append((e.u+1, e.v+1, e.weight))
    logging.info(f"""MST edges: {edges_selected}""")
    return sum([edge.weight for edge in MST])

if __name__ == "__main__":
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO)
    n = 6 # num of vertices in the weighted undirected graph
    # assume the vertex index are continuous integers starting from 1 
    e = [(1, 2, 10), (3, 5, 19), (4, 6, 7), 
         (1, 5, 3),  (2, 6, 31), (3, 6, 20)]
    m = len(e) # num of edges
    ds = DisjointSet(m)
    edges = [None] * m 

    # allocate edge info 
    for i in range(m):
        u, v, weight = e[i][0], e[i][1], e[i][2]
        u -= 1 # transform index, [1, n] -> [0, n-1]
        v -= 1 
        edges[i] = Edge(u, v, weight)

    print("MST weights sum:", k_algo(n, edges, ds))

Reference

1 Data Structure and Algorithm in Python, Goodrich and etc.

你可能感兴趣的:(图-最小生成树, 2022-10-30)