

Algorithm Thinking

Peak finding

1-Dimensional array

1.Def: b is a peak iff it’s b≥a & b≥c.
2.straightforward algorithm : starts from the left then walks across the array
i.The worst case complexity is called θ \theta θ n , it’s essentially the order of n and it’s a constant times n.
ii.Asympotic complexity : The asymptotic complexity of it is linear ,cuz when theta n changes as the order changes.
To lower the asymptotic complexity ----
3.binary search : Look at n/2 position ,if n/2-1 ≥ n/2 then only check the left side. Right side the same
if neither (n/2)-1 nor (n/2)+1 is greater than n/2, then n/2 is a peak.
worst complexity : n/2
*Cuz the algorithm is recursive ,then there must be a recurrence formula or value to describe it. *
T is the work algorithm does on input of n
T(n)=T(n/2)+ θ \theta θ (1)
θ \theta θ (1) corresponds to the two comparisons that you do looking at , potentially the two comparisons , left and right. 2 is a constant ,then we got θ \theta θ (1) ?
T(n)= θ \theta θlog2(n)

Sum: The key way to improve an algorithm is to divide it , the 2-D the same.

2D Version

1.Greedy ascent algorithm : picks a direction then follows it to find a peak.
Worst case complexity : θ \theta θ (nm) if m=n,then θ \theta θ (n2)

2.Binary search : start from m/2 column , using 1-D binary search for the peak in (I,m/2);
then keeps the row I ,then check the peak at column j . The peak is (I,j).
!!!Care that the algorithm is incorrect. For
The row peak and the column peak may not agree

3.The improved way :
assume the matrix is (n*m) then , take the m/2 column, global search the peak in n rows; then we got (I,m/2)if the (I-1,m/2) is greater or equal to the (I,m/2) then check the left side for the peak .And repeated the moves above.
Deal with the right respectively.

T(n,m)=T(n,m/2)+ θ \theta θ (n)
For a (n,m) work can be divided into a (n,m/2) one ,then a θ \theta θ (n) global search .
Hence T(n,m)= θ \theta θ (nlog2m)

I suppose the recurrence to be T(n,m)=T(n,m/2)+ θ \theta θ (log2n)+ θ \theta θ (1)

In lesson 1.1 there are two complexity : θ \theta θ and asymptotic
T describes the work that a algorhipm carries on ;
while θ \theta θ demonstrates the choice or worst searching amounts .
but it seems that all the theta are n related linear values.


I.What is the meaning of θ \theta θ (n) essentially is a constant times n.
ii.The sum of log2n θ \theta θ (1) is θ \theta θ (log2n)
iii.Why the recursive formula in 2-D is not T(n,m)=T(n,m/2)+ θ \theta θ (log2n)+ θ \theta θ (1)*
iv.Why log2m times θ \theta θ (n) equals θ \theta θ (nlog2m)
v.What’s asypomotic complexity . If it’s the average or the tendency of θ \theta θ (n) with every new order or algorithm?

Models of Computation Document Distance

Model of computation specifies
-what operations an algorithm can do
-cost of time of each operation

Two models of coputation :

1. Random access machine (ram ,same as random access memory)

(1)They are almost the same thing but not. The former is mathematically analog of the latter,which is for programing.
(2) In constant θ \theta θ(1) time , an algorithm can basically
read in or load θ \theta θ(1) word
do θ \theta θ(1) computation
write them out --store θ \theta θ(1) words
a θ \theta θ(1) of registers

word : w bits. w should be at least lg(size of memory) ?
Cuz words should be able to specify in the index of array

2.Pointer Machine

dynamically allocated objects
has a θ \theta θ(1) number of fields , the field can be either a word
a pointer is something points to others
It’s also called references
link list: the list employs pointer machine

3. Models in Python

i.list = array
L[j]=L[I]+5 θ \theta θ(1)
ii. object with θ \theta θ(1) attributes
Using table doulbing. θ \theta θ(1) time
iv. L=L1+L2
L=[], for every x in L1, L.append(x) θ \theta θ(|L1|)
for every x in L2 ,L.append (x) θ \theta θ(|L2|)
θ \theta θ(1+|L1|+|L2|). θ \theta θ(1)is the cost of append method
v. x in L
linear time just scan through the entire list
vi. len(L) θ \theta θ(1)
vii.L.sort(): O(|L|lg|L|)
it uses a comparison sort algorithm
viii.dict: D[key]=val.
Hash table. with θ \theta θ(1)
ix.x+y. O(|x|+|y|)
x*y θ \theta θ((|x|+|y|)(lg3)) lg=log23

Document Distance

Denote as :d(D1,D2)
distance may describe the similarity of two documents
document=a sequence of words
word= a string

idea: shared words and uses it to def document distance
think of the doc as a vector
D[w]= #occurrence of w in D Algorithm_第1张图片

D1=“the cat” D2=“the dog”
dot product: d’(D1,D2)=D1·D2= ∑ D1[w] D2[w].
Default : a quite long string with a small one may have the score of dot product as 1000; while some two short but similar string may dots only at score 100 .Hence ,it may not a good way to describe the doc distance.
The best way to describe such similarity is angle!
d(D1,D2)=angle of (D1,D2)=D1 D2/|D1||D2|

Procedure of computing Document Distance
1.split doc into words
2.compute the frequency of word
3. compute dot product

mechanism :
for word in doc , count[word]+=1. θ \theta θ(|Doc|)
split may consult to method


1.The time cost of each model ,why’s that
2.Why w bits should be the log of the size of memory

Insertion sort Merge Sort

Why sorting
Insertion sorting
Merge sort(Divide & Conquer)
Recurrence solving

Why sorting?

2)problems becomes easy once items are sorted: Finding the median
array A[0:n]->sorted-> B[0,n]
Look at B is odd or even . if odd [B+1/2] .if not,[B/2]
3)Binary search : if you look for a specific item
A[0:n] by scanning throughout, cost linear time
while it’s sorted ,then it takes logarithmic time
? why
*Mechanism: assume that k is the target item. First algorithm will compare k to B[n/2]
if k is smaller ,then B[n/4] ,then by halfing it .It takes logarithmic time. *
binary search is the simplest but most straightforward way to show: divide and conquer .Which helps to turn a linear search to a logarithmic search[log]
4)Data compression
5)Computer graphics

Insertion Sort

For I=1,2,…n . Insert A[I] into sorted array A[0,i-1] by pairwise swaps down to the correct position for the number that is initially in A[I]
5 2 4 6 1 3
⬆️ key
which means to start from the second element, cuz the first is sorted by the definition
1)pairwise swap
2 5 4 6 1 3
2)now key is 4 then swap same
2 4 5 6 1 3
3)cuz the A[I]=5 is in right order with the Key 6,then the key moves forward to 1.
Then by implementing 4 swaps then it comes to be
2 4 5 6 1 3 -> 1 2 4 5 6 3
4) now the key at 3 ,then 3 get swipped for 3 times
1 2 3 4 5 6

All in all: θ \theta θ(n) steps (key position)
and for each key position , the worst case swap is n . θ \theta θ(n) (swaps or compares)
Hence , it’s a θ \theta θ(n2) algorithm .Cuz there are θ \theta θ(n) poisition and for each position carries θ \theta θ(n) swaps.

  • When compares >>swaps .What is the simple-fix to the θ \theta θ(n2) comparison
    my_ans: might consoult to the binary search , which means I do comparison only between A[I] and A[n/2]
    if smaller then compare [I] to A[n/4] . if I bigger ,then swap it.

    ! Correct .The simplst-fix of this algorithm is to change pairwise swaps to Binary search . Cuz it’s a sorted array. (?)
  • Do a binary search on A[0:i-1] already sorted in θ \theta θ(lgi)times .Thus a θ \theta θ(nlgn) comparison.

But this does not help a n array data structure.

Merge Sort

We split array A into L and R.And we keeps spliting to get a single number, at the bottom .

Merge: Two sorted array as input.
20 13 7 2
12 11 9 1
two finger algorithm:
Compare 2 and 1 . 1 is small ,then cross out 1. finger move up to 9.
has the comlexity of θ \theta θ(n)
The whole complexity of the merge sort is θ \theta θ(nlgn)

?T(n)=C1 +2T(n/2) +cn
divide recurrence merge
