Lecture 2: Models of Computation(Ⅱ)
代码版本V1效率很低,我们试着找到效率瓶颈,在代码中加入以下代码:
if __name__ == "__main__": import profile profile.run("main()")
运行的部分结果:
从结果看get_words_from_line_list,count_frequency,get_words_from_string,insertion_sort运行效率低,我们尝试改进,代码如下:
def get_words_from_line_list(L): """ Parse the given list L of text lines into words. Return list of all words found. """ word_list = [] for line in L: words_in_line = get_words_from_string(line) # Using "extend" is much more efficient than concatenation here: word_list.extend(words_in_line) return word_list
def count_frequency(word_list): """ Return a list giving pairs of form: (word,frequency) """ D = {} for new_word in word_list: if D.has_key(new_word): D[new_word] = D[new_word]+1 else: D[new_word] = 1 return D.items()
# global variables needed for fast parsing # translation table maps upper case to lower case and punctuation to spaces translation_table = string.maketrans(string.punctuation+string.uppercase, " "*len(string.punctuation)+string.lowercase) def get_words_from_string(line): """ Return a list of the words in the given input string, converting each word to lower-case. Input: line (a string) Output: a list of strings (each string is a sequence of alphanumeric characters) """ line = line.translate(translation_table) word_list = line.split() return word_list
将insertion_sort改成merge_sort
def merge_sort(A): """ Sort list A into order, and return result. """ n = len(A) if n==1: return A mid = n//2 # floor division L = merge_sort(A[:mid]) R = merge_sort(A[mid:]) return merge(L,R) def merge(L,R): """ Given two sorted sequences L and R, return their merge. """ i = 0 j = 0 answer = [] while i<len(L) and j<len(R): if L[i]<R[j]: answer.append(L[i]) i += 1 else: answer.append(R[j]) j += 1 if i<len(L): answer.extend(L[i:]) if j<len(R): answer.extend(R[j:]) return answer
改进后的运行结果,速度明显提升: