e.g.
1.2.3. O(N) big-o notation: worst case
1.2.4. big omega notation: best case
1.2.5. big theta notation:
1.2.6. but not the way around
e.g.
1.3. Property 1:
Property 2:
1.4. Facts: proof by L’Hôspital’s rule
1.5. Input size: 1 byte = 8 bits =
1.6. Common sense: if the algorithm ends in many rounds, the time complexity is
2.2.1.
2.2.2.
2.2.3.
2.3. Solve problems in recursive way: cut problem into sub-problems
3.4. Summary:
Algorithms Worst case Best case Average case
Naïve sort
Bubble sort
Merge sort
Quick sort
Where , the lower bound issue.
Stack
4.1. Method: push, pop, top
4.2. Time complexity: N operations are of for push() and pop()
4.3. Related problems: prefix/infix expression evaluation, call stack.
Queue
5.1. Method: enqueue, dequeue, front
5.2. Time complexity: 1 operation is of for dequeue, and 1 operation is of for enqueue (moving all element back for 1 position).
5.3. Cyclic array: reduced enqueue time complexity to : record head and tail position, update position after enqueue.
5.4. Queue implementation with 2 stacks.
Divide and conquer algorithm
6.1. E.g. sorting algorithm above
6.2. E.g. linear time k-th largest number algorithm: split array A (size N) into M groups, choose pivot - the median within medians of each group, use the pivot to split A into 2 sub-array, and choose array with k-th position number, and recursion…
Here, we are sure that numbers of N is less than pivot. Then the function is:
Algorithm improved.
6.3. E.g. Optimization problem – backpack packing problem
Description: Given weights and values of items, find max value given weight bound.
7. Dynamic programming
7.1. E.g. greedy algorithm, like Huffman encoding problem
2 basic algorithms
Algorithm Divide-and-conquer Dynamic programming
Define Solve sub-probs and divide into smaller sub-probs and solve subs, finally combine sub-sols Divide first, find solution, go to larger sub-problem, combine solution, find solution
Abstract data type (ADT)
9.1. Set
9.2. Cartesian product: all combination of two sets
9.3. Relationship set: union, intersection
9.4. Linked list (insert, delete, find):
9.4.1. Find k-th element: 2 pointers at head, one goes twice faster than the other one.
9.4.2. Doubly linked list
9.5. Tree
9.5.1. T = (r, V, E), r is root, V is a set of nodes, E is a set of edges between nodes
9.5.2. Tree is recursive ADT, with ancestors and successors
9.5.3. Special case – binary tree
9.5.3.1. If tree is at height , then there are nodes, leaves, and internal nodes
9.5.3.2. Problems: infix/prefix calculation
9.5.4. Binary search tree (BST)
9.5.4.1. Def: left node is smaller, right node is larger
9.5.4.2. Creating algorithms:
Naïve algorithm (brute force)
9.5.4.3. Problems: Huffman encoding, finding longest path in tree.
9.5.5. Special case – balanced tree (at height )
9.5.5.1. Time complexity: insert, delete, find at worst case
9.5.6. Special case – red-black tree, AVL tree
9.5.7. Special case – heap
Priority queue
10.1. It is similar to min-heap or max-heap, where root value is min or max in tree
10.2. Methods
10.2.1. Insertion: put new number at last position and bubble it up, by comparing with the value of its parents, at
10.2.2. Pop: pop head, and put last element at head, then bubble down, update time
10.3. Problem – streaming algorithms, keep only top 10 numbers no matter the input
10.3.1. Solution: selection tree (tournament tree), build at , update at
10.3.1.1. winner tree, leaves are numbers, node is the larger ones of two children
10.3.1.2. loser tree, node is smaller one of two children, where the largest value can also be recorded during the comparison.
10.3.2. Usage: merge M sorted array. Each time, we update winner (loser) and move to the next value.
10.3.3. Good: tournament tree save time on (1) looking up data is expensive (2) looking up leaves is expensive
Graph
11.1. Definition: G = (V, E), E is edge between two nodes, V is vertex (node)
11.2. Classification:
11.3. Example: tree is a special graph without cycle. Cycle. Weight Graph, G = (V, E, W), W is weight.
11.4. Concepts:
Degree of node, D(x): the number of edges attached to node x
Out-degree: in directed graph, the number of edges leaving node x
In-degree: in directed graph, the number of edges pointing towards node x
Regular graph: the D(x) is same for any node x in graph, x边形
Sub-graph: vertices are sub-vertices
Induced sub-graph: vertices are sub-vertices, AND edges have endpoints in U
Connected graph: for any pair of nodes (x, y), there must be a path between x and y in graph. Connected components of undirected graph is a partition of vertices such that within each subset, vertices are mutually reachable.
11.5. Representation:
11.5.1. NxN adjacent matrix, 1 represents existing edge between two vertices (column and row are vertices).
11.5.2. Array of linked list.
11.6. Methods to enumerate graph:
11.6.1. Breadth first search (BFS): explore neighbor first, by marking attached edges available and the vertex unavailable, then next neighbor vertex
11.6.2. Depth first search (DFS): explore neighbors until no unvisited node attached, go back, do recursion. If none, restart DFS on a new vertex. Do recursion inside for loop. In connected undirected graph, it must visit all vertices.
Time complexity:
, linear in edge, if use linked list. Because each position in the adjacency linked list is visited once and there are |V| vertices and |E| edges.
, if use adjacent matrix. Because each position in the matrix is visited once.
11.7. Related problems: shortest paths, INDEPENDENT SETS, COVER SETS, minimum cut, min spanning tree, Euler tour, travelling salesman problem.
11.7.1. Longest path in tree. Start from root to leave, or leave to root, both ok.
11.7.2. Spanning tree. Tree that contains all vertices in graph, e.g. DFS tree, BFS tree.
11.7.3. Minimum spanning tree (MST): connect all vertices with minimum weight edges.
11.7.3.1. Solution: greedy algorithm. Always choose min weight edges, and then move to next vertices, until all vertices are visited.
If use heap and linked list, it takes .
(1) Choose a minimum weighted edge (a, b) from V-S to S, add a to S
(2) Repeat (1) until S = V
If use adjacency matrix, it takes
(1) Each time we choose a smallest value from D, add vertex to S
(2) Update value in D
(3) Repeat (1) and (2) until S = V
D 1 2(linked to 1) 3(linked to 1) 4(linked to 1) 5 6(linked to 1) 7
Start at 1 Inf 2 10 5 Inf 1 Inf
D 1 2(linked to 1) 3(linked to 1) 4(linked to 6) 5 6(linked to 6) 7
Start at 1 Inf 2 10 2 Inf Inf Inf
Correctness (Claim and proof) prove minimality in different situation
11.7.3.2. Union find set (UF) for MST
Union method: by rank, or by tree size. F is a set of value, UF unite values together to form a larger union in F.
11.7.3.3. Kruskal’s algorithm for MST
F is a union of verti:ces, T is paired vertices (less weight), S is edges with weight in ascending order. Pop min weight edge and linked vertices in S, store them in T, union pairs from T to F, if the number is visited in F, then not store them in T. Finally, T is pairs of vertices, then add weights between these vertices together to form a MST.
11.7.4. Bipartite graph: if there is a way to split undirected graph G into 2 sets A, B, s.t. all edges are between A and B.
11.7.5. Matching: find a set of edges, that no endpoint of any found edge in the set is overlapping.
11.7.6. Shortest path between two nodes in graph – Dijkstra’s algorithm
11.7.6.1. Pseudocode
init
S = {s}, current = s
dist = [inf, inf, inf, inf, inf, …], dist[s] = 0, // distance to s at current stage
()
For each (current, u) in E, u not in S
if dist[u] > dist[current] + w[current, u] update the dist[u]
Put current in S
Let current = min_j { dist[ j ] } for all j not in S
Repeat () until t is in S.
11.7.6.2. If we continue until all vertices included, then we can calculate shortest path from the source to all vertices.
11.7.6.3. Correctness:
11.7.6.4. Time complexity:
11.7.7. Detect negative cycle and negative weight in graph G – Bellman-ford algorithm. After many runs, if the d[k] is still changing, there is a negative cycle.
11.8. Directed graph.
11.8.1. Dijkstra’s shortest path algorithms also works on directed graph without negative weight.
11.8.2. Classification and concepts:
Acyclic: no circle in graph
Sinks/minima: out-degree 0 vertices
Sources/maxima: in-degree 0 vertices
11.8.3. Topological sorting, for acyclic undirected graph. Method is, “find source, remove source vertex and attached edges”, then again recursion.
11.8.3.1. Implementation based on linked list graph.
B is stack stores unexplored vertices, and D is current in-degree of vertices in the remaining graph.
11.8.3.2. Prove correctness on claim if all vertices have non-zero in degree, then there is a cycle in the graph.
11.9. Edges
11.9.1. Type: tree edges, backward edges (point to ancestor), forward edges (point to descendant), cross edges (else in directed graph)
11.10. Strongly connected components (SCC): (a, b) where a can reach b, and b can reach a in directed G, such relation R is equivalent relation. A class of R is called SCC.
11.10.1. To find all linked SCC in graph, topologically sort can make sure the starting vertex is not trapped in a SCC, and can go to other SCC. SCC(u) < SCC(v) means there is a path from v to u in G, but not the way around.
11.10.2. Implementation on finding the lowest SCC.
(1) Reverse graph, , where edges are reversed in directed graph G.
(2) Do DFS on , starting with any vertex, and restart for several rounds later. Then get several DFS trees. Finally, find the source.
(3) From source, do DFS on G.
Time complexity is
Elementary P, NP, NP-hardness
12.1. Sample HARD/NP problems.
12.1.1. 3-SAT problems: N union of variables, each variable is of three Boolean symbols. Determine this N union is true or false.
12.1.2. Independent set problem. Set S is any pair of vertices in S, no edge in between.
12.1.3. Vertex cover problem. Cover set S is a subset of V, such that all edges in G can be reached through these vertices.
Find largest independent set in graph <=> find smallest vertex cover set in graph.
12.1.4. Clique problem.
12.1.5. Hitting set problem. given a set O of objects and a collection C of subsets of O. Whether there is a set of K objects from O such that for each c in C, there is one element from K in c. E.g. O = {1,2,3,4,5}, C = { {1,2,5},{3,4}}, K=2, then the hitting set can be {1,3,5}
12.1.6. Set cover problem: given a set S of objects and a collection C of subsets of O. Whether there is a subset D of C whose size is K and such that = .
12.1.7. Backpack packing problem.
12.2. Classifications.
12.2.1. Easy problems: can be solved in polynomial time algorithm
12.2.2. Hard problems: cannot be solved in polynomial time algorithm.
12.2.3. P: deterministic polynomial time solvable
12.2.4. NP: non-deterministic polynomial time solvable
12.2.5. Hardness: defined by whether the problem can be reduced. E.g. a problem is NP hard if other NP problem can be reduced to this problem.
12.2.6. NP complete: if the problem is NP and NP hard
12.3. Prove a problem is HARD.
12.4. Reduction: in polynomial time, the algorithm can transfer input from problem A, to suit input of problem B. then problem A can be reduced in polynomial time to problem B. , want to find a final reduced problem x that can be solved in polynomial time.
12.4.1. E.g. clique problem can be reduced to independent set.
An independent set is a group of nodes where for any pair of nodes in the set, there is not an edge between those nodes. A clique is a group of nodes where for any pair of nodes in the set, there is an edge between those nodes. Therefore, an independent set in a graph G is a clique in the complement of G and vice-versa.
Given this, a simple transformation would be given G and k to produce (the complement of G) and k. Then, G has an independent set of size k if and only if has a clique of size k.
12.4.2. E.g. clique problem can be reduced to vertex cover. E.g. vertex cover to hitting set. E.g. vertex cover to set cover.
Dynamic programming
13.1. Problem: longest non-decreasing sequence, longest common subsequence
Solution: define subproblems, write recursion, write pseudocode, prove correctness, find time complexity.