2020年美赛数学建模c题部分代码(也是python的简单学习代码)

将评论内容转化为对应的分数值
#以前已经从评论中统计好了各个单词的出现次数和等价之间的关系, 现在我们筛选出好的特征词汇和坏的特征词汇,统计评价的得分(即满意程度)
#统计单词的出现个数
charts = [‘star’, ‘five’,‘love’,‘great’,‘good’]#加分特征
badcharts = [‘bad’,‘but’,‘not’,‘out’] #减分特征
def row_count(filename): #评论内容以文本形式传入来
try:
with open(filename) as f_obj:
content = f_obj.read()
except FileNotFoundError:
msg = “The file " + filename + " does not exist.”
print(msg)
else:
content = content.replace(’,’, ’ ‘)
content = content.replace(’.’, ’ ‘)
content = content.replace(’-’, ’ ')
content = content.strip().lower()
words = content.split()
num = 0
for chart in charts:
#遍历每个特征词汇 统计好的特征单词出现在文本中的次数之和
num = num + words.count(chart)
for badchart in badcharts:
#遍历每个特征词汇 统计坏的特征单词出现在文本中的次数 统计作为得分
num = num - words.count(badchart)

if name == ‘main’:
filename = ‘Heart.txt’
goal = row_count(filename)
print(goal)

'Heart.txt’是一个文本文件,即评论的句子

代码二,统计每个单词的出现次数:
def row_count(filename):
try:
with open(filename) as f_obj:
content = f_obj.read()
except FileNotFoundError:
msg = “The file " + filename + " does not exist.”
print(msg)
else:
for i in ‘~!@#$%^&*()_±={}|:"<>?[];,./—’:
content = content.replace(’,’, ’ ') # 处理标点符号
content = content.strip().lower()
words = content.split()

#统计每个单词出现的个数
counts = {}  # 空列表  存放新的单词
for i in words:
    counts[i] = counts.get(i, 0) + 1

listhills = list(counts.items())
listhills.sort(key=lambda x: x[1], reverse=True)

for i in range(30):
    word, counts = listhills[i]
    print('{0:<10}{1:>5}'.format(word, counts))

if name == ‘main’:
row_count(‘1.txt’)

你可能感兴趣的:(python数据分析)