The problem that we consider is not a toy problem; it is a fundamental computational task, and the solution that we develop is of use in a variety of applications, from percolation in physical chemistry to connectivity in communications networks. We start with a simple solution, then seek to understand that solution’s performance characteristics, which help us to see how to improve the algorithm.
Dynamic connectivity We start with the following problem specification: The input is a sequence of pairs of integers, where each integer represents an object of some type and we are to interpret the pair p q as meaning "p is connected to q." We assume that "is connected to" is an equivalence relation, which means that it is:
■ Reflexive : p is connected to p.
■ Symmetric : If p is connected to q, then q is connected to p.
■ Transitive : If p is connected to q and q is connected to r, then p is connected to r.
An equivalence relation partitions the objects into equivalence classes. In this case, two objects are in the same equivalence class if and only if they are connected. Our goal is to write a program to filter out extraneous pairs (pairs where both objects are in the same equivalence class) from the sequence. In other words, when the program reads a pair p q from the input, it should write the pair to the output only if the pairs it has seen to that point do not imply that p is connected to q. If the previous pairs do imply that p is connected to q, then the program should ignore the pair p q and proceed to read in the next pair. The figure on the facing page gives an example of this process. To achieve the desired goal, we need to devise a data structure that can remember sufficient information about the pairs it has seen to be able to decide whether or not a new pair of objects is connected. Informally, we refer to the task of designing such a method as the dynamic connectivity problem. This problem arises applications such as the following:
Networks. The integers might represent computers in a large network, and the pairs might represent connections in the network. Then, our program determines whether we need to establish a new direct connection for p and q to be able to communicate or whether we can use existing connections to set up a communications path. Or, the integers might represent contact sites in an electrical circuit, and the pairs might represent wires connecting the sites. Or, the integers might represent people in a social network, and the pairs might represent friendships. In such applications, we might need to process millions of objects and billions of connections.
Variable-name equivalence. In certain programming environ- ments, it is possible to declare two variable names as being equivalent (references to the same object). After a sequence of such declarations, the system needs to be able to determine whether two given names are equivalent. This application is an early one (for the FORTRAN programming language) that motivated the development of the algorithms that we are about to consider.
Mathematical sets. On a more abstract level, you can think of the integers as belonging to mathematical sets. When we process a pair p q, we are asking whether they belong to the same set.If not, we unite p’s set and q’s set, putting them in the same set.
To specify the problem, we develop an API that encapsulates the basic operations that we need: initialize, add a connection between two sites, identify the component containing a site, determine whether two sites are in the same component, and count the number of components. Thus, we articulate the following API:
public class UF { public UF(int N) {} // initialize N sites with integer names (0 to N-1) void union(int p, int q) {} // add connection between p and q int find(int p) {} // component identifier for p (0 to N-1) // return true if p and q are in the same component boolean connected(int p, int q) {} int count() {} // number of components }
The union() operation merges two components if the two sites are in different com- ponents, the find() operation returns an integer component identifier for a given site, the connected() operation determines whether two sites are in the same component, and the count() method returns the number of components. We start with N components, and each union() that merges two different components decrements the number of components by 1.
public class UF { private int[] id; // id[i] = parent of i private int[] sz; // sz[i] = number of objects in subtree rooted at i private int count; // number of components /** * Create an empty union find data structure with N isolated sets. * * @throws java.lang.IllegalArgumentException * if N < 0 */ public UF(int N) { if (N < 0) throw new IllegalArgumentException(); count = N; id = new int[N]; sz = new int[N]; for (int i = 0; i < N; i++) { id[i] = i; sz[i] = 1; } } /** * Return the id of component corresponding to object p. * * @throws java.lang.IndexOutOfBoundsException * unless 0 <= p < N */ public int find(int p) { if (p < 0 || p >= id.length) throw new IndexOutOfBoundsException(); while (p != id[p]) { p = id[p]; } return p; } /** * Return the number of disjoint sets. */ public int count() { return count; } /** * Are objects p and q in the same set? * * @throws java.lang.IndexOutOfBoundsException * unless both 0 <= p < N and 0 <= q < N */ public boolean connected(int p, int q) { return find(p) == find(q); } /** * Replace sets containing p and q with their union. * * @throws java.lang.IndexOutOfBoundsException * unless both 0 <= p < N and 0 <= q < N */ public void union(int p, int q) { int i = find(p); int j = find(q); if (i == j) return; // make smaller root point to larger one if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; } else { id[j] = i; sz[i] += sz[j]; } count--; } }
Analysis of complexity: