Introduction to Analysis of Algorithms
Graph Theory
Graphs
A graph G consists of an ordered pair of sets = (, ) where ≠ ?, and ? (2) = {2-subsets of }.
In other words E consists of unordered pairs of elements of V. We call = () the vertex set, and =
() the edge set of G. In this handout, we consider only graphs in which both the vertex set and edge set
are finite. An edge {, }, typically denoted xy or yx, is said to join its two end vertices x and y, and these
ends are said to be incident with the edge xy. Two vertices are called adjacent if they are joined by an edge,
and two edges are said to be adjacent if they have a common end vertex. A graph will usually be depicted
as a collection of points in the plane (vertices), together with line segments (edges) joining the points.
Two graphs 1 and 2 are said to me isomorphic if there exists a bijection : (1) → (2) such that for
any , ∈ (1), the pair xy is an edge of 1 if and only if the pair ()() is an edge of 2. In other
words, the function must preserve all incidence relations amongst the vertices and edges in 1 . We write
1 ? 2 to mean that 1 and 2 are isomorphic.
Example 2 Let 1 be the graph from the previous example, and define 2 by (2) = {, , , , , },
(2) = {, , , , , , , , , }. 2 can be drawn as
Define a mapping : (1) → (2) by 1 → , 2 → , 3 → , 4 → , 5 → , 6 → . One checks easily
that is an isomorphism.
Isomorphic graphs are indistinguishable as far as graph theory is concerned. In fact, graph theory can be
defined to be the study of those properties of graphs that are preserved by isomorphisms. Thus a graph is
not a picture, in spite of the way we visualize it. Instead, a graph is a combinatorial object consisting of
two abstract sets, together with some incidence data relating those sets.
Notice that our definition of a graph does not allow for the existence of an edge joining a single vertex to
itself (sometimes called a loop), since an edge = {, } must be a 2-element subset of V, and therefore
must have distinct ends.
Neither does our definition allow two distinct edges to join the same pair of vertices (parallel edges), since
the edge set E, being a set of pairs of vertices, cannot contain the same pair twice.
When these types of edges are allowed, we call the resulting structure a multi-graph. Some authors use the
term “graph” to denote what we have designated as a multi-graph. Those authors would call our notion of
graph (i.e. a graph in which loops and parallel edges do not occur) a simple graph.
The degree of a vertex ∈ (), denoted deg (), is the number of edges incident with x, or equivalently,
the number of vertices adjacent to x. Referring to Example 1 above, we see that deg(1) = 2, deg(2) = 5
and deg(6) = 3. The degree sequence of a graph consists of its vertex degrees arranged in increasing order.
The graph in Example 1 has degree sequence (2, 3, 3, 3, 4, 5). Observe that the graph in Example 2 has the
same degree sequence. Clearly if : (1) → (2) is an isomorphism, then deg(()) = deg () for
any ∈ (), and hence isomorphic graphs must have the same degree sequence. This is our first example
of an isomorphism invariant, a graph property that is preserved by isomorphisms.
Observe that
∑ deg ()
∈()
= 2|()|
since each edge, having two distinct ends, contributes 2 to the sum on the left. This is sometimes known as
the Handshake Lemma for it says that the number of hands shaken at a party is exactly twice the number of
handshakes.
Exercise 1 Show that the number vertices of odd degree in any graph must be even. (Hint: suppose G
contains an odd number of odd vertices. Argue that the left hand side of the above equation is then odd,
while the right hand side is clearly even.)
Given any , ∈ (), an x-y walk in G, is a sequence of vertices = 0, 1, 2, … , ?1, = for which
?1 ∈ () for 1 ≤ ≤ . We call x the origin and y the terminus of the walk. Note these need not be
distinct. If = , the walk is said to be closed. The length of the walk is k, the number of edge traversals
performed in going from x to y along the sequence. Since the edges of a graph have no inherent direction,
we do not distinguish between the above sequence and its reversal: = , ?1, … , 2, 1, 0 = . Thus
the designation as to which vertex in a walk is the origin and which is the terminus is arbitrary. If G contains
an x-y walk, we say that y is reachable from x (equivalently x is reachable from y). A special case of a
closed walk is the trivial walk = 0 = of length 0, in which no edges are traversed.
A walk in which no edge is traversed more than once is called a trail, and a trail in which no vertex is visited
more than once (except possibly when origin=terminus) is called a path. Observe that the trivial walk is
both a trail and a path. A non-trivial closed path is called a cycle. Since our graphs have no loops or parallel
edges, the minimum possible length for a cycle is 3.
3
Example 3 Referring again to Example 1 we have:
a cycle of length 3: 2, 5, 6, 2
a cycle of length 6: 1, 2, 3, 6, 5, 4, 1
a 1-6 path of length 5: 1, 4, 2, 5, 3, 6
a 1-6 path of length 2: 1, 2, 6
a 3-1 trail that is not a path: 3, 2, 5, 6, 2, 1
a 3-1 walk that is not a trail: 3, 5, 2, 4, 5, 2, 1
the trivial 1-1 path: 1 (though this is a closed path, it is not a cycle)
Note that the reachability relation could just as well have been defined in terms of paths instead of walks
since, if G contains an x-y walk that is not a path, we can, by eliminating some cycles, replace it by an x-y
path. For instance the 3-1 walk in Example 1 above (3, 5, 2, 4, 5, 2, 1) contains the cycle (5, 2, 4, 5), and
can be replaced by the 3-1 path (3, 5, 2, 1).
The distance from x to y , denoted (, ), is defined as follows. If y is reachable from x, then (, ) is
the length of a shortest x-y path. If y is not reachable from x (i.e. if no x-y path exists), then (, ) is
infinity. The Single Source Shortest Path (SSSP) problem is this: given a distinguished vertex ∈ ()
called the source, determine (, ) for all ∈ (), and for each x that is reachable from s, determine a
shortest s-x path in G. We will learn two efficient algorithms that solve this problem.
A graph G is said to be connected iff y is reachable from x for every pair of vertices , ∈ (). If G is
not connected, it is called disconnected. Examples 1 and 2 above are clearly connected, while the following
graph is disconnected.
Example 4 = {1, 2, 3, 4, 5, 6, 7, 8, 9}, = {12, 15, 25, 26, 56, 37, 38, 78, 49}
A subgraph of a graph G is a graph H in which () ? (), and () ? (). In the above example,
({1, 2, 5}, {12, 15, 25}) is a connected subgraph, while ({2, 3, 6, 7}, {26, 37}) is a disconnected subgraph.
A subgraph H is called a connected component of G if it is (i) connected, and (ii) maximal with respect to
property (i), i.e. any other subgraph of G that properly contains H is disconnected. Observe Example 4 has
three connected components: ({1, 2, 5, 6}, {12, 15, 25, 26, 56}), ({3, 7, 8}, {37, 38, 78}) and ({4, 9}, {49}).
Obviously a graph is connected if and only if it has exactly one connected component.
Trees
A graph is called acyclic (or a forest) if it contains no cycles. A tree is a graph that is both connected and
acyclic. The connected components of an acyclic graph are obviously trees. The following example is a
forest with three connected components.
of vertices: 8 6 5
of edges: 7 5 4
Observe that in each tree of this forest, the number of edges is one less that the number of vertices. This
fact holds in general for all trees. The following lemmas demonstrate how the independent properties of
connectedness and acyclicity are related.
Lemma 1 If T is a tree with n vertices and m edges, then = ? 1.
Proof:
This result was proved in the handout on Induction Proofs by induction on n. We prove it here by induction
on m. If = 0 then T can have only one vertex, since T is connected. Thus = 1, and = ? 1,
establishing the base case. Now let > 0 and assume that any tree ′ with fewer than m edges satisfies
|(′)| = |(′)| ? 1. Pick an edge ∈ () and remove it. The resulting graph consists of two trees
1, 2, each having fewer than m edges. Suppose has edges and vertices ( = 1, 2). Then the
induction hypothesis gives = ? 1 ( = 1, 2). Also = 1 + 2 since no vertices were removed.
Therefore = 1 + 2 + 1 = (1 ? 1) + (2 ? 1) + 1 = 1 + 2 ? 1 = ? 1, as required. ■
Lemma 2 If G is an acyclic graph with n vertices, m edges, and k connected components, then = ? .
Proof:
Let the connected components of G (which are necessarily trees) be denoted 1, ?2, ?… ,?. Suppose
has edges and vertices respectively (1 ≤ ≤ ). By Lemma 1 we have = ? 1 (1 ≤ ≤ ).
T
Lemma 3 If G is a connected graph with n vertices and m edges, then ≥ ? 1.
Proof:
Our proof is a generalization of that of Lemma 1, again by induction on m. If = 0, then G, being
connected, can have only one vertex, hence = 1. Therefore ≥ ? 1 reduces to 0 ≥ 0, showing that
the base case is satisfied.
Let > 0, and assume for any connected graph ′ with fewer than m edges that |(′)| ? ≥ ? |(′)| ? 1.
Remove an edge ∈ () and let ? denote the resulting subgraph. We have two cases to consider.
Case 1: is connected. We note that has n vertices and 1 edges, so the induction hypothesis
gives 1 ≥ 1. Certainly then ≥ 1, as was claimed.
Case 2: ? is disconnected. In this case ? consists of two connected components. (**See the claim
and proof below.) Call them 1 and 2, and observe that each component contains fewer than m edges.
Suppose has edges and vertices ( = 1, ?2). The induction hypothesis gives ≥ ? 1 ( =
1, ?2). Also = 1 + 2 since no vertices were removed. Therefore
= 1 + 2 + 1 ≥ (1 ? 1) + (2 ? 1) + 1 = 1 + 2 ? 1 = ? 1
and therefore ≥ ? 1 as required. ■
Claim**: Let G be a connected graph and ∈ (), and suppose that ? is disconnected. (Such an
edge e is called a bridge). Then ? has exactly two connected components.
Proof:
Since ? is disconnected, it has at least two components. We must show that it also has at most two
components. Let e have end vertices u, and v. Let and be the connected components of ? that
contain u and v respectively. Choose ∈ () arbitrarily, and let P be an x-u path in G (note P exists since
G is connected.) Either P includes the edge e, or it does not. If P does not contain e, then P remains intact
after the removal of e, and hence P is an x-u path in ? , whence ∈ . If on the other hand P does
contain the edge e, then e must be the last edge along P from x to u.
In this case is an x-v path in ? , whence ∈ . Since x was arbitrary, every vertex in ?
belongs to either or , and therefore ? has at most two connected components. ■
Lemma 4 If G is a graph with n vertices, m edges, and k connected components, then ≥ ? .
Proof:
Let 1,?2, ?… ,? , be the connected components of G. Let and denote the number of vertices and
edges, respectively, of , for 1 ≤ ≤ . By Lemma 3 we have ≥ ? 1, for 1 ≤ ≤ , and therefore
Lemma 5 Let G be a connected graph with n vertices and m edges. Suppose also that = ? 1. Then
G is acyclic, and hence a tree.
Proof:
Suppose G is connected and = ? 1. Assume, to get a contradiction, that G is not acyclic. Let e be any
edge belonging to any cycle in G. Remove e from G, and denote the resultant graph by ? . Observe
that ? has ? 1 edges and n vertices, respectively. Since e is a cycle edge, its removal does not
disconnect G, and therefore ? is also connected. Lemma 3 above then gives ? 1 ≥ ? 1, whence
≥ . But then = ? 1 gives ? 1 ≥ , a contradiction. Therefore our original assumption was
false, and therefore G is acyclic, as claimed. Being connected, G is also a tree. ■
Lemma 6 Let G be an acyclic graph with n vertices and m edges. Suppose also that = ? 1. Then G
is connected, and hence a tree.
Proof:
Suppose G is acyclic and = ? 1. Let k be the number of connected components of G. By Lemma 2
we have = ? , whence ? 1 = ? , and hence = 1, showing that G is connected, as claimed.
■
Lemma 7 Let G be a connected graph with n vertices and m edges. Suppose also that = . Then G
contains exactly one cycle. (Such a graph is called unicyclic.)
Proof:
G contains at least one cycle since otherwise G is a tree, and hence = 1 (by Lemma 1), contrary to
hypothesis. If G contained two distinct cycles, say 1 and 2, we could find edges 1 ∈ (1) ? (2) and
2 ∈ (2) ? (1). Removing these two edges gives a connected graph = ? 1 ? 2 with |()| ? =
and |()| ? = ? ? 2, contradicting Lemma 3. ■
Consider the following three properties of a graph = (, ) in light of Lemmas 1, 5, and 6:
(i) G is connected,
(ii) G is acyclic
(iii) || = || ? 1.
We see that these properties are logically dependent in the sense that if any two hold, then the third must
also hold. Lemma 1 states that (i) and (ii) together imply (iii), Lemma 5 says that (i) and (iii) imply (ii),
and Lemma 6 says (ii) and (iii) imply (i). The following theorem summarizes these and other facts about
trees.
Theorem 1 (The Treeness Theorem) Let = (, ) be a graph with || = and || = . Then the
following statements are equivalent.
a) G is a tree (i.e. connected and acyclic).
b) G contains a unique x-y path for any , ∈ .
c) G is connected, but if any edge is removed, the resulting graph ? is disconnected.
d) G is connected, and = ? 1.
e) G is acyclic, and = ? 1.
f) G is acyclic, but if any edge is added (joining two non-adjacent vertices), then the resulting graph
- contains a unique cycle.
Note: This is theorem B.2 in Cormen, Leiserson, Rivest, & Stein (p.1085 in 2nd ed., p.1174 in 3rd ed.)
Proof: As mentioned in the preceding paragraph, Lemmas 1, 5, and 6 have already established the
equivalences (a)?(d)?(e).
7
(a)?(b): Suppose G is a tree and let , ∈ (). If = , then the trivial path is the only x-y path in G,
since being a tree, G has no cycles. Suppose ≠ . Since G is connected, there exists at least one - path
in G. Assume, to get a contradiction, that G contains two distinct - paths. Call them 1 and 2. By
traveling along 1 from x to y, then along 2 from y to x, we obtain a closed walk in G that begins and ends
at x. If no vertex (other than x) is repeated in this walk, then we have found a cycle in G. If some vertex is
repeated in this walk, we can obtain a cycle as follows. Travel along 1 from x to the first repeated vertex,
then back to x along 2. Therefore G contains a cycle, contradicting that it is a tree. This contradiction
shows our assumption was false, and hence two different - paths cannot exist in G, which therefore
contains a unique - path.
() ? (): Suppose G contains a unique x-y path for any , ∈ . Then G is certainly connected. Pick
any edge = {, } ∈ and remove it. Since e constitutes a path joining its two ends (x and y) and since
there are no other such paths by hypothesis, the resulting graph ? contains no x-y path, and is therefore
disconnected.
() ? (): Suppose G is connected, and if any edge is removed from G, the resulting graph is disconnected.
Assume, to get a contradiction, that G contains a cycle. Then the removal of any edge on that cycle would
not disconnect G, contrary to hypothesis. It follows that no such cycle can exist, and hence G is acyclic.
Since G is also connected, it is a tree.
() ? (): Suppose G is a tree (i.e. connected and acyclic). By Lemma 1 we also have = ? 1. Pick
any two non-adjacent vertices and join them with a new edge e. The resulting graph + is still connected,
and |( + )| = + 1 = = |( + )|. Lemma 7 now says that + contains exactly one cycle.
() ? (): Suppose G is acyclic, and if a new edge e is inserted between any two non-adjacent vertices,
then the resulting graph + is unicyclic. Pick any , ∈ with ≠ . If x and y are adjacent, then
certainly y is reachable from x. If x and y are non-adjacent, insert a new edge e joining x to y, and call the
resulting cycle C. Observe that the edges of ? constitute an x-y path in G, so in this case also, y is
reachable from x. Since x and y were arbitrary, we conclude that G is connected, and therefore a tree.
This completes the proof of the Treeness Theorem. ■
Directed Graphs
A Directed Graph (or Digraph) = (, ) is a pair of sets, where the vertex set = () is, as before,
finite and non-empty, and the edge set = () ? × , i.e. E consists of ordered pairs of vertices.
The directed edge (x, y) in the above example is said to have origin x and terminus y, and we say that x is
adjacent to y. The origin and terminus of a directed edge are said to be incident with that edge. Two edges
are called adjacent if they have a common end vertex, so for instance (x, y) in the above example is adjacent
to (u, x). The in degree of a vertex is the number of edges having that vertex as terminus, and it’s out degree
is the number of edges having that vertex as origin. The degree of a vertex is the sum if it’s in degree and
out degree. Thus in the above example id() = 1, od() = 2, and deg() = 3. The analog of the
handshake lemma for directed graphs is
As in the undirected case, there is a simple notion of isomorphism for directed graphs. Two digraphs 1
and 2 are said to me isomorphic if there exists a bijection : (1) → (2) such that for any , ∈
(1), the ordered pair (, ?) is a directed edge of 1 if and only if the ordered pair ((), ?()) is a
directed edge of 2. Thus preserves incidence relations and directionality amongst the vertices and edges
of 1. We write 1 ? 2 to mean that 1 and 2 are isomorphic.
A directed path P in a digraph is a finite sequence of vertices P: 0, 1, 2, … , ?1, such that (?1, ) ∈
for all 1 ≤ ≤ . As in the undirected case, we require that all vertices be distinct (except possibly 0
and ), and that no edge be traversed more than once. If it so happens that the initial and terminal vertices
are the same, 0 = 1, the path is called a directed cycle. The length of such a path is k, the number of
edges traversed. If = 0 ≠ = , we call P a directed x-y path. Notice that, unlike the undirected case,
a directed x-y path and a directed y-x path are not the same thing. We say that ∈ () is reachable from
∈ () if G contains a directed x-y path. Observe that for digraphs, the reachability relation is reflexive
(x is reachable from x via the trivial path with no edges), transitive (if y is reachable from x, and z is reachable
from y, then z is reachable from x), but not symmetric (y may be reachable from x without x being reachable
from y).
A digraph G is said to be strongly connected if for all , ?? ∈ (), both x is reachable from y, and y is
reachable from x. Notice that the digraph in Example 6 above is not strongly connected, since for instance,
u is not reachable from y. The following example is strongly connected.
More generally, a subset ? () is said to be strongly connected if for all , ? ∈ , both x is reachable
from y, and y is reachable from x. Furthermore, a subset ? () is said to be a strongly connected
component of the digraph G if it is (i) strongly connected, and (ii) maximal with respect to property (i), i.e.
any other subset of () that properly contains S is not strongly connected. Obviously G is strongly
connected iff it has just one strongly connected component, namely () itself. Going back to the digraph
in Example 6, we see that it has 2 strongly connected components: {, , } and {}.
If we replace each directed edge in a digraph G with an undirected edge, we obtain an (undirected) graph
known as the underlying undirected graph of G. Note that two non-isomorphic digraphs, such as Examples
6 and 7 above, can have the very same underlying graph.
Representations of Graphs and Digraphs
We discuss three methods for representing graphs and digraphs in terms of standard data structures available
in most computer languages. They are called the Incidence Matrix, the Adjacency Matrix, and the
Adjacency List representations respectively. In what follows we suppose = (, ?) to be a graph
(directed or undirected) with || = and || = .
The Incidence Matrix () requires that both the vertex set () and the edge set () be ordered. For
this purpose we suppose that = {1, 2, … , } and = {1, 2, 3, … , }. Then () is an ×
rectangular matrix. Row i corresponds to vertex , for 1 ≤ ≤ . Column j corresponds to edge (1 ≤
≤ ), and contains zeros everywhere except for the two rows corresponding to the ends of . If G is an
undirected graph, these two rows contain 1s. If G is a directed graph, the row corresponding to the origin
of contains ?1, while the row corresponding to the terminus of contains +1. Thus () = () where
in the undirected case: