思路
1、使用json框架解析出城市的编码
2、使用pymysql框架操作mysql数据库
3、每个循环里面代码逻辑如下:
1)得到城市的编码
2)如果编码为空,说明是省份,使用continue进入下个循环
3)发送查看该城市天气的请求
4)使用json框架解析天气接口返回的数据
5)将该城市的天气数据导入数据库
遇到的问题
很多城市的天气爬取失败。使用浏览器请求接口,结果如下:
原因分析
频繁请求天气接口,导致ip被封。
解决方案
循环里面设置等待1秒,这样爬虫就会每隔1秒钟发送一次请求(每分钟的请求次数大约60次,低于阈值300次)
sleep(1)
数据分析
实时温度最高的城市
select * from weatherinfo
where convert(wendu,decimal) = (select max(convert(wendu,decimal)) from weatherinfo);
实时温度最低的城市
select * from weatherinfo
where convert(wendu,decimal) = (select min(convert(wendu,decimal)) from weatherinfo);
二十大高温城市
select * from weatherinfo order by convert(wendu,decimal) desc limit 0,20;
十大低温城市
select * from weatherinfo order by convert(wendu,decimal) limit 0,10;
城市编码错误,接口返回403结果码
完整代码
如下:
# -*- coding: utf-8 -*-
from requests import *
from json import *
from pymysql import *
import traceback
from time import *
def f():
# 请求城市的接口
res = get('http://cdn.sojson.com/_city.json')
# 打印响应头
# print res.headers
# 打印响应正文
# print res.text
cities = loads(res.text)
# 连接数据库
conn = connect('192.168.0.124', 'root', '123456', 'weather', charset='utf8')
# 获取游标
cursor = conn.cursor()
for city in cities:
cid = city['city_code']
if cid == '':
continue
url = 'http://t.weather.itboy.net/api/weather/city/{0}'.format(cid)
# print(url)
quality = ''
pm25 = -1
pm10 = -1
shidu = ''
try:
# 请求天气的接口
res = get(url)
# 等待1秒
sleep(1)
# 将响应正文转成字典对象
obj = loads(res.text)
data = obj['data']
cityinfo = obj['cityInfo']
except KeyError:
print(url)
print(res.text)
if 'too many requests' in res.text:
print(u'ip可能被封')
break
else:
continue
except:
print(url)
print(u'网络异常')
traceback.print_exc()
continue
try:
quality = data['quality']
pm25 = data['pm25']
pm10 = data['pm10']
shidu = data['shidu']
except KeyError:
print(url)
print(u'数据异常')
# 构造sql语句
sql = "insert into weatherinfo values('%s', '%s', '%s', '%s', %d, %d, '%s', '%s', '%s')" % \
(cityinfo['cityId'], cityinfo['city'], cityinfo['parent'], shidu, pm25, pm10,
quality, data['wendu'], data['ganmao'])
# print(sql)
try:
# 执行sql语句
cursor.execute(sql)
conn.commit()
except:
print(sql)
print(u'导入数据库失败')
traceback.print_exc()
# 发生错误时回滚
conn.rollback()
# 关闭数据库连接
conn.close()
if __name__ == '__main__':
f()
参考资料
[1] 免费天气API,天气JSON API,不限次数获取十五天的天气预报
https://www.sojson.com/blog/305.html
[2] Python错误解决:UnicodeEncodeError: 'latin-1' codec can't encode characters in position
https://blog.csdn.net/lynn_coder/article/details/79504564
[3] 命令行修改MySQL数据库密码
https://www.cnblogs.com/supiaopiao/p/8527391.html
[4] python3连接mysql数据库及异常信息处理
https://blog.csdn.net/lovelong8808/article/details/77193752
[5] python字符串拼接
https://www.cnblogs.com/yexuesong/p/9232349.html
[6] python中name的使用
https://www.cnblogs.com/1204guo/p/7966461.html
[7] 浅析python中name = 'main' 的作用
https://www.cnblogs.com/alan-babyblog/p/5147770.html
微信扫一扫关注该公众号【测试开发者部落】
点击链接加入群聊【软件测试学习交流群】
https://jq.qq.com/?_wv=1027&k=5eVEhfN
软件测试学习交流QQ群号: 511619105