In computer science, the Hopcroft–Karp algorithm (sometimes more accurately called the Hopcroft–Karp–Karzanov algorithm)[1] is an algorithm that takes a bipartite graph as input and produces a maximum cardinality matching as output – a set of as many edges as possible with the property that no two edges share an endpoint. It runs in {\displaystyle O(|E|{\sqrt {|V|}})}O(|E|\sqrt{|V|}) time in the worst case, where {\displaystyle E}E is set of edges in the graph, {\displaystyle V}V is set of vertices of the graph, and it is assumed that {\displaystyle |E|=\Omega (|V|)}{\displaystyle |E|=\Omega (|V|)}. In the case of dense graphs the time bound becomes {\displaystyle O(|V|{2.5})}O(|V|{2.5}), and for sparse random graphs it runs in time {\displaystyle O(|E|\log |V|)}{\displaystyle O(|E|\log |V|)} with high probability.[2]
The algorithm was discovered by John Hopcroft and Richard Karp (1973) and independently by Alexander Karzanov (1973).[3] As in previous methods for matching such as the Hungarian algorithm and the work of Edmonds (1965), the Hopcroft–Karp algorithm repeatedly increases the size of a partial matching by finding augmenting paths. These paths are sequences of edges of the graph, which alternate between edges in the matching and edges out of the partial matching, and where the initial and final edge are not in the partial matching. Finding an augmenting path allows us to increment the size of the partial matching, by simply toggling the edges of the augmenting path (putting in the partial matching those that were not, and vice versa). Simpler algorithms for bipartite matching, such as the Ford–Fulkerson algorithm‚ find one augmenting path per iteration: the Hopcroft-Karp algorithm instead finds a maximal set of shortest augmenting paths, so as to ensure that only {\displaystyle O({\sqrt {|V|}})}O(\sqrt{|V|}) iterations are needed instead of {\displaystyle O(V)}O(V) iterations. The same performance of {\displaystyle O(|E|{\sqrt {|V|}})}O(|E|\sqrt{|V|}) can be achieved to find maximum cardinality matchings in arbitrary graphs, with the more complicated algorithm of Micali and Vazirani.[4]
The Hopcroft–Karp algorithm can be seen as a special case of Dinic’s algorithm for the maximum flow problem.[5]
Hopcroft–Karp algorithm
Class Graph algorithm
Data structure Graph
Worst-case performance {\displaystyle O(E{\sqrt {V}})}O(E{\sqrt V})
Worst-case space complexity {\displaystyle O(V)}O(V)
A vertex that is not the endpoint of an edge in some partial matching {\displaystyle M}M is called a free vertex. The basic concept that the algorithm relies on is that of an augmenting path, a path that starts at a free vertex, ends at a free vertex, and alternates between unmatched and matched edges within the path. It follows from this definition that, except for the endpoints, all other vertices (if any) in augmenting path must be non-free vertices. An augmenting path could consist of only two vertices (both free) and single unmatched edge between them.
If {\displaystyle M}M is a matching, and {\displaystyle P}P is an augmenting path relative to {\displaystyle M}M, then the symmetric difference of the two sets of edges, {\displaystyle M\oplus P}M \oplus P, would form a matching with size {\displaystyle |M|+1}|M| + 1. Thus, by finding augmenting paths, an algorithm may increase the size of the matching.
Conversely, suppose that a matching {\displaystyle M}M is not optimal, and let {\displaystyle P}P be the symmetric difference {\displaystyle M\oplus M^{}}M \oplus M^ where {\displaystyle M{*}}M* is an optimal matching. Because {\displaystyle M}M and {\displaystyle M{*}}M* are both matchings, every vertex has degree at most 2 in {\displaystyle P}P. So {\displaystyle P}P must form a collection of disjoint cycles, of paths with an equal number of matched and unmatched edges in {\displaystyle M}M, of augmenting paths for {\displaystyle M}M, and of augmenting paths for {\displaystyle M{*}}M; but the latter is impossible because {\displaystyle M{*}}M is optimal. Now, the cycles and the paths with equal numbers of matched and unmatched vertices do not contribute to the difference in size between {\displaystyle M}M and {\displaystyle M{*}}M, so this difference is equal to the number of augmenting paths for {\displaystyle M}M in {\displaystyle P}P. Thus, whenever there exists a matching {\displaystyle M{*}}M larger than the current matching {\displaystyle M}M, there must also exist an augmenting path. If no augmenting path can be found, an algorithm may safely terminate, since in this case {\displaystyle M}M must be optimal.
An augmenting path in a matching problem is closely related to the augmenting paths arising in maximum flow problems, paths along which one may increase the amount of flow between the terminals of the flow. It is possible to transform the bipartite matching problem into a maximum flow instance, such that the alternating paths of the matching problem become augmenting paths of the flow problem. It suffices to insert two vertices, source and sink, and insert edges of unit capacity from the source to each vertex in {\displaystyle U}U, and from each vertex in {\displaystyle V}V to the sink; and let edges from {\displaystyle U}U to {\displaystyle V}V have unit capacity.[6] A generalization of the technique used in Hopcroft–Karp algorithm to find maximum flow in an arbitrary network is known as Dinic’s algorithm.