In computer realm,Sequence is not continuous,on the contrary,String is continuous;in the biology,sequence is called gapped sequence,string is called sequence.
Sequence similarity problem occurs in search engine,command line,genome
Paradigm one:Similar sequence ⇒ similar organism:Microorganism how to be classified?
Paradigm two:Similar sequence ⇒ similar structure ⇒ similar function:protein,DNA,RNA and so on.
The basic method of sequence alignment is Dynamic Programming.
Q:Given U,V,how to measure the similarity?
Definition: the alignment of U and V is to insert ” ” into sequences to make them the same length n.(“” means space)
Note:alignment of ” ” and ” “is forbidden.
Example:
N:cat. V:act
Example one:
Given scoring function
s(caactt)=3−1−1=1 better!
Example two:
s(caactt)=3−1−1=1
Definition:Given U,V,w(), asks to find the optimal global alignment that has the maximum score.
S(U,V):score of the optimal alignment number.
s(alignment):the score of the alignment.
S(U,V) = s(T) ⇒ T is the optimal alignment.
Key observation: the structure of the optimal solutions.
(1)T: optimal alignment for act and cat.
s(T) = S(act,cat)
what do we know about last column of T?