目标网址:https://wannianrili.bmcx.com/
目标:获取1970年到2021年每一天的天干地支
查看万年历网址的Network,在转换月份或者年份的时候发现network中出现了一些新的东西
点开一看,就是我们需要获取的数据,而且url方便更改,更改其中的年份和月份即可对应。
因为获取的是1970-2021的每一天,数据量比较大,所以在爬的过程中要做注意添加时间元件,控制每次爬取的速度。接下来编写代码:
// An highlighted block
import requests,openpyxl,time # 导入模块
from bs4 import BeautifulSoup
import random
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title = '数据' #自定义表名
a = 1
for i in range(1970,2022):
if a == 1:
for j in range(1,13):
url = 'https://wannianrili.bmcx.com/ajax/?q={}-{}&v=20031912'.format(i,j) #构造目标网址
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36',
'Referer': 'https://wannianrili.bmcx.com/'}
res = requests.get(url, headers=headers)
if res.status_code == 200:
soup = BeautifulSoup(res.text,'html.parser')
datas = soup.find_all(class_="wnrl_k_you")
for data in datas:
date = data.find(class_="wnrl_k_you_id_biaoti").text.replace(' ','').replace('\n','')
tian = data.find(class_="wnrl_k_you_id_wnrl_riqi").text.replace(' ','').replace('\n','')
text = data.find(class_="wnrl_k_you_id_wnrl_nongli_ganzhi").text.replace(' ','').replace('\n','')
date1 = date[:8] + tian + date[8:len(date)]
year = text[:7]
month = text[7:10]
day = text[10:len(text)]
sheet.append([date1,year,month,day])
time.sleep(5)
print(i,j)
else:
print('Error')
a = 0
break
else:
print('错误')
break
wb.save('万年历.xlsx')
wb.close()