Disjoint Set Union (DSU) 并查集及其应用

关于我的 Leetcode 题目解答,代码前往 Github:https://github.com/chenxiangcyr/leetcode-answers


Disjoint Set Union (DSU) 并查集

并查集是一种非常精巧而实用的数据结构,它主要用于处理一些不相交集合的合并问题。
一些常见的用途有:

  • 求连通子图
  • 求最小生成树的 Kruskal 算法
  • 求最近公共祖先(Least Common Ancestors, LCA)等。

使用并查集时,首先会存在一组不相交的动态集合 S={S1,S2,⋯,Sk}

每个集合可能包含一个或多个元素,并选出集合中的某个元素作为代表。每个集合中具体包含了哪些元素是不关心的,具体选择哪个元素作为代表一般也是不关心的。

我们关心的是,对于给定的元素,可以很快的找到这个元素所在的集合(的代表),以及合并两个元素所在的集合,而且这些操作的时间复杂度都是常数级的。

并查集的基本操作有三个:

  • makeSet(s):建立一个新的并查集,其中包含 s 个单元素集合。
  • unionSet(x, y):把元素 x 和元素 y 所在的集合合并,要求 x 和 y 所在的集合不相交,如果相交则不合并。
  • find(x):找到元素 x 所在的集合的代表,该操作也可以用于判断两个元素是否位于同一个集合,只要将它们各自的代表比较一下就可以了。

并查集的实现原理也比较简单,就是使用树来表示集合,树的每个节点就表示集合中的一个元素,树根对应的元素就是该集合的代表,如图所示:

Disjoint Set Union (DSU) 并查集及其应用_第1张图片
并查集的树表示

图中有两棵树,分别对应两个集合,其中第一个集合为 {a,b,c,d}

树的节点表示集合中的元素,指针表示指向父节点的指针,根节点的指针指向自己,表示其没有父节点。沿着每个节点的父节点不断向上查找,最终就可以找到该树的根节点,即该集合的代表元素。

现在,应该可以很容易的写出 makeSet 和 find 的代码了,假设使用一个足够长的数组来存储树节点,那么 makeSet 要做的就是构造出如图的森林,其中每个元素都是一个单元素集合,即父节点是其自身。


Disjoint Set Union (DSU) 并查集及其应用_第2张图片
构造并查集初始化
const int MAXSIZE = 500;
int uset[MAXSIZE];
 
void makeSet(int size) {
    for(int i = 0;i < size;i++) uset[i] = i;
}

接下来,就是 find 操作了,如果每次都沿着父节点向上查找,那时间复杂度就是树的高度,完全不可能达到常数级。这里需要应用一种非常简单而有效的策略:路径压缩。

路径压缩:就是在每次查找时,令查找路径上的每个节点都直接指向根节点,如图所示。

Disjoint Set Union (DSU) 并查集及其应用_第3张图片
路径压缩
// 递归版本
int find(int x) {
    if (x != uset[x]) uset[x] = find(uset[x]);
    return uset[x];
}

// 非递归版本
int find(int x) {
    int p = x, t;
    while (uset[p] != p) p = uset[p];
    while (x != p) { t = uset[x]; uset[x] = p; x = t; }
    return x;
}

最后是合并操作 unionSet,并查集的合并也非常简单,就是将一个集合的树根指向另一个集合的树根,如图所示。


Disjoint Set Union (DSU) 并查集及其应用_第4张图片
并查集的合并

这里也可以应用一个简单的启发式策略:按秩合并。该方法使用秩来表示树高度的上界,在合并时,总是将具有较小秩的树根指向具有较大秩的树根。简单的说,就是总是将比较矮的树作为子树,添加到较高的树中。为了保存秩,需要额外使用一个与 uset 同长度的数组,并将所有元素都初始化为 0。

void unionSet(int x, int y) {
    if ((x = find(x)) == (y = find(y))) return;
    if (rank[x] > rank[y]) uset[y] = x;
    else {
        uset[x] = y;
        if (rank[x] == rank[y]) rank[y]++;
    }
}

除了按秩合并,并查集还有一种常见的策略:按集合中元素个数合并,将包含节点较少的树根,指向包含节点较多的树根。这个策略与按秩合并的策略类似,同样可以提升并查集的运行速度,而且省去了额外的 rank 数组。

这样的并查集具有一个略微不同的定义,即若 uset 的值是正数,则表示该元素的父节点(的索引);若是负数,则表示该元素是所在集合的代表(即树根),而且值的相反数即为集合中的元素个数。相应的代码如下所示:
如果要获取某个元素 x 所在集合包含的元素个数,可以使用 -uset[find(x)] 得到。

const int MAXSIZE = 500;
int uset[MAXSIZE];
 
void makeSet(int size) {
    for(int i = 0;i < size;i++) uset[i] = -1;
}
int find(int x) {
    if (uset[x] < 0) return x;
    uset[x] = find(uset[x]);
    return uset[x];
}
void unionSet(int x, int y) {
    if ((x = find(x)) == (y = find(y))) return;
    if (uset[x] < uset[y]) {
        uset[x] += uset[y];
        uset[y] = x;
    } else {
        uset[y] += uset[x];
        uset[x] = y;
    }
}

时间复杂度

Statement: If m operations, either Union or Find, are applied to n elements, the total run time is O(m * logn)
证明参见:https://en.wikipedia.org/wiki/Proof_of_O(log*n)_time_complexity_of_union%E2%80%93find

LeeCode题目

LeetCode题目:684. Redundant Connection
In this problem, a tree is an undirected 无向图 graph that is connected and has no cycles.
The given input is a graph that started as a tree with N nodes (with distinct values 1, 2, ..., N), with one additional edge added. The added edge has two different vertices chosen from 1 to N, and was not an edge that already existed.

The resulting graph is given as a 2D-array of edges. Each element of edges is a pair [u, v] with u < v, that represents an undirected edge connecting nodes u and v.

Return an edge that can be removed so that the resulting graph is a tree of N nodes. If there are multiple answers, return the answer that occurs last in the given 2D-array. The answer edge [u, v] should be in the same format, with u < v.

Example 1:
Input: [[1,2], [1,3], [2,3]]
Output: [2,3]
Explanation: The given undirected graph will be like this:


Disjoint Set Union (DSU) 并查集及其应用_第5张图片
Example 1

Example 2:
Input: [[1,2], [2,3], [3,4], [1,4], [1,5]]
Output: [1,4]
Explanation: The given undirected graph will be like this:


Example 2

Note:

  • The size of the input 2D-array will be between 3 and 1000.
  • Every integer represented in the 2D-array will be between 1 and N, where N is the size of the input array.
class Solution {
    public int[] findRedundantConnection(int[][] edges) {
        int[] parent = new int[2001];
        
        // makeSet(s):建立一个新的并查集
        for (int i = 0; i < parent.length; i++) parent[i] = i;
        
        for (int[] edge: edges){
            int f = edge[0], t = edge[1];
            
            // 判断两个元素是否位于同一个集合,只要将它们各自的代表比较一下就可以了
            if (find(parent, f) == find(parent, t)) {
                return edge;
            }
            else {
                unionSet(parent, f, t);
            }
        }
        
        return new int[2];
    }
    
    // find(x):找到元素 x 所在的集合的代表
    private int find(int[] parent, int f) {
        // 路径压缩
        if (f != parent[f]) {
          parent[f] = find(parent, parent[f]);
        }
        
        return parent[f];
    }
    
    // unionSet(x, y):把元素 x 和元素 y 所在的集合合并,要求 x 和 y 所在的集合不相交,如果相交则不合并。
    private void unionSet(int[] parent, int x, int y) {
        if ((x = find(parent, x)) == (y = find(parent, y))) return;
        
        parent[x] = y;
    }
}

LeetCode题目:685. Redundant Connection II
In this problem, a rooted tree is a directed 有向图 graph such that, there is exactly one node (the root) for which all other nodes are descendants of this node, plus every node has exactly one parent, except for the root node which has no parents.

The given input is a directed graph that started as a rooted tree with N nodes (with distinct values 1, 2, ..., N), with one additional directed edge added. The added edge has two different vertices chosen from 1 to N, and was not an edge that already existed.

The resulting graph is given as a 2D-array of edges. Each element of edges is a pair [u, v] that represents a directed edge connecting nodes u and v, where u is a parent of child v.

Return an edge that can be removed so that the resulting graph is a rooted tree of N nodes. If there are multiple answers, return the answer that occurs last in the given 2D-array.

Example 1:
Input: [[1,2], [1,3], [2,3]]
Output: [2,3]
Explanation: The given directed graph will be like this:


Disjoint Set Union (DSU) 并查集及其应用_第6张图片
Example 1

Example 2:
Input: [[1,2], [2,3], [3,4], [4,1], [1,5]]
Output: [4,1]
Explanation: The given directed graph will be like this:


Disjoint Set Union (DSU) 并查集及其应用_第7张图片
Example 2

Note:

  • The size of the input 2D-array will be between 3 and 1000.
  • Every integer represented in the 2D-array will be between 1 and N, where N is the size of the input array.
class Solution {
    public int[] findRedundantDirectedConnection(int[][] edges) {
        int[] parent = new int[edges.length];
        
        // makeSet(s):建立一个新的并查集
        for (int i = 0; i < edges.length; i++) parent[i] = i;

        int[] candidate1 = null, candidate2 = null;
        
        for (int[] edge: edges){
            int rootx = find(parent, edge[0] - 1);
            int rooty = find(parent, edge[1] - 1);
            
            if (rootx != rooty) {
                // record the last edge which results in "multiple parents" issue
                if (rooty != edge[1]-1) {
                    candidate1 = edge;
                }
                else {
                    unionSet(parent, edge[1] - 1, edge[0] - 1);
                }
            }
            else {
                // record last edge which results in "cycle" issue, if any.
                candidate2 = edge;
            }
                
        }

        // if there is only one issue, return this one.
        if (candidate1 == null) return candidate2; 
        if (candidate2 == null) return candidate1;
        
        // If both issues present, then the answer should be the first edge which results in "multiple parents" issue
        // Could use map to skip this pass, but will use more memory.
        for (int[] e : edges) {
            if (e[1] == candidate1[1]) {
                return e;
            }
        }

        return new int[2];
    }

    // find(x):找到元素 x 所在的集合的代表
    private int find(int[] parent, int f) {
        // 路径压缩
        if (f != parent[f]) {
          parent[f] = find(parent, parent[f]);
        }
        
        return parent[f];
    }
    
     // unionSet(x, y):把元素 x 和元素 y 所在的集合合并,要求 x 和 y 所在的集合不相交,如果相交则不合并。
    private void unionSet(int[] parent, int x, int y) {
        if ((x = find(parent, x)) == (y = find(parent, y))) return;
        
        parent[x] = y;
    }
}

LeetCode题目:261. Graph Valid Tree
Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to check whether these edges make up a valid tree.

For example:
Given n = 5 and edges = [[0, 1], [0, 2], [0, 3], [1, 4]], return true.
Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [1, 3], [1, 4]], return false.

Note: you can assume that no duplicate edges will appear in edges. Since all edges are undirected, [0, 1] is the same as [1, 0] and thus will not appear together in edges.

class Solution {
    public boolean validTree(int n, int[][] edges) {
        // initialize n isolated islands
        int[] nums = new int[n];
        for(int i = 0; i < nums.length; i++) {
            nums[i] = i;
        }
        
        // perform union find
        for (int i = 0; i < edges.length; i++) {
            int x = find(nums, edges[i][0]);
            int y = find(nums, edges[i][1]);
            
            // if two vertices happen to be in the same set
            // then there's a cycle
            if (x == y) return false;
            
            // union
            nums[x] = y;
        }
        
        return edges.length == n - 1;
    }
    
    public int find(int nums[], int i) {
        if(i != nums[i]) {
            nums[i] = find(nums, nums[i]);
        }
        
        return nums[i];
    }
}

LeetCode题目:305. Number of Islands II
A 2d grid map of m rows and n columns is initially filled with water. We may perform an addLand operation which turns the water at position (row, col) into a land. Given a list of positions to operate, count the number of islands after each addLand operation. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.

Example:
Given m = 3, n = 3, positions = [[0,0], [0,1], [1,2], [2,1]].
Initially, the 2d grid grid is filled with water. (Assume 0 represents water and 1 represents land).

0 0 0
0 0 0
0 0 0

Operation #1: addLand(0, 0) turns the water at grid[0][0] into a land.

1 0 0
0 0 0 Number of islands = 1
0 0 0

Operation #2: addLand(0, 1) turns the water at grid[0][1] into a land.

1 1 0
0 0 0 Number of islands = 1
0 0 0

Operation #3: addLand(1, 2) turns the water at grid[1][2] into a land.

1 1 0
0 0 1 Number of islands = 2
0 0 0

Operation #4: addLand(2, 1) turns the water at grid[2][1] into a land.

1 1 0
0 0 1 Number of islands = 3
0 1 0

We return the result as an array: [1, 1, 2, 3]

Challenge:

  • Can you do it in time complexity O(k log mn), where k is the length of the positions?
class Solution {
    int[][] dirs = {{0, 1}, {1, 0}, {-1, 0}, {0, -1}};

    public List numIslands2(int m, int n, int[][] positions) {
        List result = new ArrayList<>();
        if(m <= 0 || n <= 0) return result;

        int count = 0;
        
        /*
        使用DSU并查集
        */
        // one island = one tree
        int[] roots = new int[m * n];
        Arrays.fill(roots, -1);

        for(int[] p : positions) {
            // 该位置对应二维数组的编号
            int curIdx = n * p[0] + p[1];
            
            // add new island
            roots[curIdx] = curIdx;
            
            count++;
            
            for(int[] dir : dirs) {
                // 遍历四个方向
                int x = p[0] + dir[0]; 
                int y = p[1] + dir[1];
                int neighbourIdx = n * x + y;
                
                // 边界检测
                if(x < 0 || x >= m || y < 0 || y >= n) continue;

                // 如果邻居不是岛屿则忽略
                if(roots[neighbourIdx] == -1) continue;
                
                // 邻居岛屿的root
                int neighbourRoot = find(roots, neighbourIdx);
                
                // if neighbor is in another island
                if(roots[curIdx] != neighbourRoot) {
                    // union two islands
                    roots[curIdx] = neighbourRoot;
                    
                    // current tree root = joined tree root
                    curIdx = neighbourRoot;
                    
                    count--;
                }
            }

            result.add(count);
        }
        
        return result;
    }

    public int find(int[] roots, int id) {
        if(id != roots[id]) {
            roots[id] = find(roots, roots[id]);
        }
        
        return roots[id];
    }
}

引用:
并查集(Disjoint Set)

你可能感兴趣的:(Disjoint Set Union (DSU) 并查集及其应用)