1. Problem Definition:
a) Input: Strings X = x1 ... xm, Y = y1 ... yn over some alphabet Sigma (like {A,C,G,T})
b) Penalty Pgap for inserting a gap, Pab for matching a & b [presumably Pab = 0 of a = b]
c) Feasible solutions: Alignments - i.e., insert gaps to equalize lengths of the string
d) Goal: Alignment with minimum possible total penalty
e) Example :
2. A Dynamic Programming Approach
-- Key step: Identify subproblems. As usual, will look at structure of an optimal solution for clues.
[i.e., develop a recurrence + then reverse engineer the subproblems]
-- Structure of optimal solution: Consider an optimal alignment of X , Y and its final position:
-- Three cases :
a) Case 1: xm , yn matched
b) Case 2: xm matched with a gap
c) Case 3: yn matched with a gap
[Pointless to have 2 gaps]
-- Optimal substructure: Let X' = X - xm, Y' = Y - yn.
a) If case (1) holds, then induced alignment of X' & Y' is optimal.
b) If case (2) holds, then induced alignment of X' & Y is optimal.
c) If case (3) holds, then induced alignment of X & Y' is optimal.
3. Optimal Substructure Proof
-- Proof of Case 1, other cases are similar :
By contradiction. Suppose induced alignment of X' , Y' has penalty P while some other one has penalty P* < P.
==> Appending xm , yn to the latter, get an alignment of X and Y with penalty P + Pxmyn < P + Pxmyn ==> Contradicts optimality of original alignment of X & Y .
4. The Recurrence
-- Notation: Let Xi = 1st i letters of X , Yj = 1st j letters of Y. Pi,j = penalty of optimal alignment of Xi & Yj
-- Recurrence: For all i = 1, ... , m and j = 1, ... , n:
Pij = min{ (1) Pxi yj + Pi-1,j-1 , (2) Pgap + Pi-1,j (3) Pgap + Pi,j-1 }
-- Base case : Pi,0 = P0,i = i * Pgap
5. The Algorithm
-- A = 2-D array.
A[i , 0] = A[0 , i] = i * Pgap for any i >= 0
For i = 1 to m
For j = 1 to n
A[i , j ] = min { A[i-1, j-1] + Pxiyi , A[i-1, j] + Pgap , A[i, j-1] + Pgap }
-- Running Time : O(mn)
6. Reconstructing a Solution
-- Trace back through filled-in table A, starting at A[m , n]
-- When you reach subproblem A[i , j ]:
-- If A[i , j ] filled using case (1), match xi & yj and go to A[i - 1 , j - 1]
-- If A[i , j ] filled using case (2), match xi with a gap and go to A[i - 1 , j ]
-- If A[i , j ] filled using case (3), match yj with a gap and go to A[i , j - 1]
[If i = 0 or j = 0, match remaining substring with gaps]
-- Running time is only O(m + n)