15. Disjoint set

Definition:
A disjoint set data structure maintains a collection S = {S1, S2, . . . , Sk} of disjoint sets of “objects”.

3 operation:

  1. Make Set(x): create a set whose only member is object x, where x is not in any of the sets yet.
  2. Find Set(x): identify the set containing object x.
    The value returned by Find Set(x) is a pointer to the representative of the set that contains x.
  3. Union(x,y): combine two sets containing objects x and y, say Sx and Sy, into a new set Sx ∪Sy, assuming that Sx ̸= Sy and Sx ∩Sy = empty.

To represent a set, a common technique is to choose one of objects in the set as “the representative of the set”. As long as a set has not been modified, its representative does not change.

1. Connected Components

The operations in the disjoint sets can be used to solve the connected component problem in an undirected graph G = (V , E ) which is to find all connected components in G.

for each v∈V 
    Make_Set(v)
for every edge(u,v)∈E
    if Find_Set(u) ̸= Find_Set(v)
    then Union(u,v)

Same components

if Find_Set(u) = Find_Set(v) 
    then return true
    else return false
15. Disjoint set_第1张图片
Connected component

Linked List Representation

Find Set(x) has an obvious O(1) implementation: just follow the back pointer.
Union(x,y) is more expensive, since many back pointers of one of the two merged sets need to be updated to (the new representative).

Using linked lists in an unfavorable way may require Θ(n^2) time. (n = the number of Make Set operations)

Improvement:
Weighted-unionheuristic: When two lists are merged into one,always link the shorter list to the end of the longer list.

Theorem: Using the weighted-union heuristic, any sequence of m Make Set, Union, and Find Set operations, n of which are Make Set, takes O(m + n log n) time.

------For any object x, whenever the back pointer of x is adjusted during Union, the size of the set containing x is at least doubled. Therefore, the back pointer of x is adjusted at most log n times, as there are at most n objects in the collection of all disjoint sets.

15. Disjoint set_第2张图片
Linked List Representation

Directed Forest Representation of Disjoint Sets

A faster implementation of disjoint sets uses directed rooted trees, (some referres it to as an inverted tree). Each set is represented by a tree, rather than a linked list. Every node in a tree has only a parent pointer, and the tree root’s pointer points itself.

15. Disjoint set_第3张图片

Operation:

  1. The Make Set operation simply creates a tree with a single node.
  2. The Find Set operation is performed by following parent pointers until the root of the tree is found. The object in the tree root is used to represent the set.
  3. The Union operation finds the roots of the two trees, then changes the parent pointer of one of the roots to point to the other.

To reduce the time spent in following these paths, two heuristics are used:
1. Union by Rank
When merging two trees, make the root pointer of the smaller “rank” tree point to the root of the larger rank tree.

We maintain a value called the rank for each node, which is defined as follows:
1.A node which is the only node in a tree has rank 0.

2.If the roots of two trees have the same rank k, the new root of their merge is given rank k+1. Otherwise, the rank of the new root is the larger rank between the two ranks.

Any tree with root rank k has : Height at most k; At least 2^k nodes.

2. Path Compression
When we identify the root of a tree containing some node a (which happens during Find Set and Union operations), we set the parent pointers of all the nodes on the path from a to the root directly. (The rank of the tree does not changed.)

Disjoint Sets using the two heuristics

15. Disjoint set_第4张图片
15. Disjoint set_第5张图片

你可能感兴趣的:(15. Disjoint set)