Python 统计美国各州每年的金融消费者投诉数
题目介绍:
Load the Financial Services Consumer Complaint Database into your program.
数据集链接:
http://catalog.data.gov/dataset/consumer-complaint-database#topic=consumer_navigation
Allow the user to choose/input a state name.
Your program will output the total number of complaints of each year for that state accordingly.
将金融服务消费者投诉数据库加载到您的程序中。
允许用户选择/输入州的名称。
您的程序将相应地输出该州每年的投诉总数。
题目很简单,在Python中解析数据集文件就好了。网站提供了XML,JSON,CSV的文件格式。
以下是对于CSV文件的读取:
with open('Consumer_Complaints.csv', 'r', encoding='gb18030', errors='ignore') as csvFile:
reader = csv.reader(csvFile)
column = [row[8] for row in reader]
dict = {}
由于整个数据集包含的数据总量超过百万,并且有部分信息缺失,所以加上了错误忽略。
row[8]是因为在CSV文件中,第八列是每一条投诉记录的所属州的名称。这样减少了一个匹配的步骤。
以下是计算每年每州的具体投诉总数:
for key in column[0:192027]:
dict[key] = dict.get(key, 0) + 1
print(StateName,"'s consumer complaint number in 2019 is: ")
print(dict[ShortName])
a = dict[ShortName]
for key in column[192028:449389]:
dict[key] = dict.get(key, 0) + 1
print(StateName,"'s consumer complaint number in 2018 is: ")
print(dict[ShortName] -a)
b = dict[ShortName]
for key in column[449340:692359]:
dict[key] = dict.get(key, 0) + 1
print(StateName, "'s consumer complaint number in 2017 is: ")
print(dict[ShortName] - b)
c = dict[ShortName]
for key in column[692360:883830]:
dict[key] = dict.get(key, 0) + 1
print(StateName, "'s consumer complaint number in 2016 is: ")
print(dict[ShortName] - c)
d = dict[ShortName]
for key in column[883831:1048576]:
dict[key] = dict.get(key, 0) + 1
print(StateName, "'s consumer complaint number in 2015 is: ")
print(dict[ShortName] - d)
将range分开是为了方便计算出每年的数量。用总量的减去上一年的数量即得今年数量。
其实就是不断统计dict中每一个key的出现次数。
之后再用if elif按用户的输入查找就可以了。