项目目的:获取近一年的双色球开奖记录,供爬虫热爱者学习!!!
数据来源:http://www.cwl.gov.cn/ygkj/wqkjgg/ssq/
提示:以下是本篇文章正文内容,下面案例可供参考
通过查看页面源代码可以分析出,我们所要的数据并不是存储在页面源代码中,而是后期渲染的。
打开开发者工具:
查看请求头和要传入的参数:
Request URL为http://www.cwl.gov.cn/cwl_admin/front/cwlkj/search/kjxx/findDrawNotice?name=ssq&issueCount=100&issueStart=&issueEnd=&dayStart=&dayEnd=
问号后面的部分是需要的参数,请求方式为 GET
代码如下(示例):
import requests
from urllib.parse import urlencode#解析编码的库
import pandas as pd
代码如下(示例):
url = 'http://www.cwl.gov.cn/cwl_admin/front/cwlkj/search/kjxx/findDrawNotice?'
params = {
'name': 'ssq',
'issueCount': '100',
'issueStart':'',
'issueEnd': '',
'dayStart': '',
'dayEnd': '',
}
url = url+urlencode(params)#调用urlencode重新编码params
resp = requests.get(url = url)
print(resp.json())#注意要带括号,
urlencode的用法可以参考:https://blog.csdn.net/lly1122334/article/details/108402949
codes,dates,numbers,sales,first_types,second_types,third_types = [[] for i in range(7)]#创建储存数据的空列表
for ssq in resp.json()['result']:
code = ssq['code']
codes.append(code)
date = ssq['date']
dates.append(date)
number = ssq['red']+','+ssq['blue']
numbers.append(number)
sale = ssq['sales']
sales.append(sale)
first_type = ssq['prizegrades'][0]['type'], ssq['prizegrades'][0]['typenum'], ssq['prizegrades'][0]['typemoney']
first_types.append(first_type)
second_type = ssq['prizegrades'][1]['type'], ssq['prizegrades'][1]['typenum'], ssq['prizegrades'][1]['typemoney']
second_types.append(second_type)
third_type = ssq['prizegrades'][2]['type'], ssq['prizegrades'][2]['typenum'], ssq['prizegrades'][2]['typemoney']
third_types.append(third_type)
import requests
from urllib.parse import urlencode
import pandas as pd
url = 'http://www.cwl.gov.cn/cwl_admin/front/cwlkj/search/kjxx/findDrawNotice?'
params = {
'name': 'ssq',
'issueCount': '100',
'issueStart':'',
'issueEnd': '',
'dayStart': '',
'dayEnd': '',
}
url = url+urlencode(params)
resp = requests.get(url = url)
codes,dates,numbers,sales,first_types,second_types,third_types = [[] for i in range(7)]
for ssq in resp.json()['result']:
code = ssq['code']
codes.append(code)
date = ssq['date']
dates.append(date)
number = ssq['red']+','+ssq['blue']
numbers.append(number)
sale = ssq['sales']
sales.append(sale)
first_type = ssq['prizegrades'][0]['type'], ssq['prizegrades'][0]['typenum'], ssq['prizegrades'][0]['typemoney']
first_types.append(first_type)
second_type = ssq['prizegrades'][1]['type'], ssq['prizegrades'][1]['typenum'], ssq['prizegrades'][1]['typemoney']
second_types.append(second_type)
third_type = ssq['prizegrades'][2]['type'], ssq['prizegrades'][2]['typenum'], ssq['prizegrades'][2]['typemoney']
third_types.append(third_type)
# print(code,date,number,sales,first_type,second_type,third_type)
dic = {'code':codes,'date':dates,'number':numbers,'sales':sale,'first_type':first_types,'second_type':second_types,'third_type':third_types}#创建字典为创建多维表做准备
frame = pd.DataFrame(dic)
# frame.to_csv('./data/ssq/ssq.csv')#可以选择将数据储存到CSV文件
print(frame)
提示:这里对文章进行总结:
例如:以上就是今天要讲的内容,爬取近100期双色球开奖记录,其中有一个比较好的方法,快速创建多个空列表的方法:
codes,dates,numbers,sales,first_types,second_types,third_types = [[] for i in range(7)]