获取数据有多种方式,有API最好了!没有的时候才使用爬虫
下面是利用美国劳工局的API获取数据,例子代码其实官方给的.
但是官网给的并不能运行,有缩进错误等问题.下面给出我调试后的代码
1.获取格式化的数据最重要
2.,json_pretty()里的代码,只处理了一部分数据.建议根据需要自己写
3.三个data_dict都是有效的,是三种查询方式
# -*- coding:utf-8 -*-
#这是美国劳工局的API,原来的代码在if 'M01' <= period <= 'M12':这一行被引号引起来了,导致出错
#来源 https://www.bls.gov/developers/api_python.htm#python2
#如何获得所需数据的 seriesid: https://www.bls.gov/help/hlpforma.htm
''' 通过seriesid代码的修改,我们可以得到下面这些方面的数据
Employment & Unemployment
Inflation & Prices
Spending & Time Use
Pay & Benefits
Productivity
Workplace Injuries
Occupational Requirements
International
'''
import requests
import json
import prettytable
headers = {'Content-type': 'application/json'}
# data_dict = {"seriesid": ['CUUR0000SA0','SUUR0000SA0'],"startyear":"2011", "endyear":"2014"}
# data_dict ={"seriesid":["LAUCN040010000000005", "LAUCN040010000000006"],\
# "startyear":"2010", "endyear":"2012", "catalog":False,\
# "calculations":True, "annualaverage":True}
data_dict = {"seriesid":['LAUCN040010000000005']}
#官网提供的例子中registrationkey是一个已经不存在的key,不要这个就好了
data = json.dumps(data_dict)
p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)
json_pretty()
def json_pretty():
#用于将json数据表格化,其实重点还是获取想要的json数据,不是这个函数
for series in json_data['Results']['series']:
x=prettytable.PrettyTable(["series id","year","period","value","footnotes"])
seriesId = series['seriesID']
for item in series['data']:
year = item['year']
period = item['period']
value = item['value']
footnotes=""
for footnote in item['footnotes']:
if footnote:
footnotes = footnotes + footnote['text'] + ','
if 'M01' <= period <= 'M12':
x.add_row([seriesId,year,period,value,footnotes[0:-1]])
# output = open(seriesId + '.txt','w')
output = open(seriesId + '_ceshi.txt', 'w')
output.write (x.get_string())
output.close()