1、统计文件中第一个单词出现的次数
方法一:先把所有不重复的首个单词保存为列表,然后用字典的fromkeys方法把字典初始化为键为首单词值为0的字典,最后再循环一次统计单词出现的次数,该方法的算法复杂度较高,如文件有n行,则为O(n*n)
def count_word():
count = {}
keys = []
with open('test.txt') as f:
lines = f.readlines()
for line in lines:
first_word = line.split(' ')[0]
if first_word not in keys:
keys.append(first_word)
count = count.fromkeys(keys,0) #fromkeys方法不会直接修改原字典,它会返回一个新字典,所以如果要使用心字典必须先赋值
print(count)
for line in lines:
first_word = line.split(' ')[0]
count[first_word] +=1
print(count)
方法二:在找到每行的第一个单词时,先用字典的has_key方法确认字典中是否有这个键,如无,则新增键为该单词,值为1的项,如字典已有该项,则该项的值+1,该算法复杂度为O(n)
def count_word():
count = {}
with open('test.txt') as f:
lines = f.readlines()
for line in lines:
first_word = line.split(' ')[0]
if count.has_key(first_word):
count[first_word] +=1
else:
count[first_word] = 1
print(count)
2、统计一篇文章中某个单词的出现次数
分别用两个for循环即可,for line in lines: for word in line.split(' '):
def count_word():
mark_word = 'error'
count = 0
with open('test.txt') as f:
lines = f.readlines()
for line in lines:
for word in line.split(' '):
if word == mark_word:
count +=1
print(count)