今天天气不大好,我就是看看天气,我就发现这个网站数据不错,今天就给他全干下来!!!!!!链接:http://www.pm25.com/
数据就在相应源码中,我们就将这个页面响应代码,用lxml解析,将源代码转化为etree树,分别使用xpath提取链接对每一个链接进行请求,然后再对详情页响应解析,例如:北京天气详情页http://www.pm25.com/beijing.html我们大致思路就是这样,最后把数据保存为csv文件,xpath获取数据的时候有些是空值,会报错,所以我们就全部try了,代码如下:
# -- coding: utf-8 --
# @Time : 2021/1/23 3:27
# @FileName: Pm2.5.py
# @Software: PyCharm
import requests
from lxml import etree
import csv
class Weather():
# 初始化
def __init__(self):
# url
self.url = 'http://www.pm25.com/'
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/87.0.4280.88 Safari/537.36 "
}
# 发送请求
def get_data(self):
response = requests.get(url=self.url, headers=self.headers)
return response
# 解析
def parse_data(self, response):
html = etree.HTML(response.content)
link_list = html.xpath('//*[@id="scrollbar1"]/div[3]/div/div[3]/div/dl/dd/a/@href')
for link in link_list:
link = 'http://www.pm25.com' + link
# 解析子页数据
self.url = link
response = self.get_data()
html = etree.HTML(response.content)
try:
city_name = html.xpath("/html/body/div[6]/div/div[1]/h2/text()")[0]
except:
pass
try:
qua = html.xpath("/html/body/div[6]/div/div[3]/div[1]/p/span[1]/text()")[0]
except:
pass
try:
aqi_num = html.xpath("/html/body/div[6]/div/div[3]/div[1]/a/text()")[0]
except:
pass
try:
pm = html.xpath("/html/body/div[6]/div/div[3]/div[2]/p[1]/span/text()")[0] + '微克/立方米'
except:
pass
try:
wea = html.xpath("/html/body/div[6]/div/div[4]/div/p/span/text()")[0]
temp = html.xpath("/html/body/div[6]/div/div[4]/div/p/text()")[1]
add_weather = wea + temp
except:
pass
data = "城市名称:" + city_name + ", " + "空气质量:" + qua + ", " + "AQI指数:" + aqi_num + ", " + "PM2.5浓度:" + pm + ', ' + "天气:" + add_weather
print(data)
# 这里直接单写也不返回重新定义保存函数
# 写入csv
csv_writer.writerow([city_name, qua, aqi_num, pm, add_weather])
# 调用
def run(self):
response = self.get_data()
self.parse_data(response)
if __name__ == '__main__':
# 保证只运行一次,如果不保证一次话就会
with open('info.csv', 'a', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(["城市名称", "空气质量", 'AQI指数', "PM2.5", "天气"])
weather = Weather()
weather.run()