博主已经开通微信公众号啦,欢迎关注哈!
天气预报是我们日常生活中接触和使用频度比较高的一种数据类型, 国家和各个地方也都设立了相应的专责机构负责气象数据的解析、处理和预报发布等工作,今天本文主要是对中央气象台网站中实时更新和发布的数据进行采集分析处理。
我们以杭州为例,打开对应页面数据截图如下:
可以看到是八点刚刚更新的数据,这个网站里面提供的数据更新粒度是3小时,也就是每3个小时动态刷新一次数据。
往下面拉动页面,可以看到更加详细的数据内容:
这些内容区块就是我们需要获取的数据内容,首先基于selenium获取到所有省份的编码数据以及下属各个区县的编码数据信息,这里同样以浙江省为例,部分数据信息如下:
我们以杭州市页面链接为例进行分析,如下:
http://www.nmc.cn/publish/forecast/AZJ/hangzhou.html
从上面我们可以看到:http://www.nmc.cn/publish/forecast/ 是所有城市详情页面数据的共同前缀,之后的AZJ表示的是浙江省省份编码信息,hangzhou表示的是杭州市的编码信息,最后拼接上“.html”就是一个城市的详情数据页面的URL了。
基于上面的URL结构分析后我们可以自动地构建待爬取城市的URL数据,之后交由爬虫完成数据的爬取。
为了方便使用,我们在进行数据爬取之前,会利用dict数据类型完成中文省份、城市与对应编码数据的映射关系,具体实现如下:
之后就可以编写数据爬虫了,实现如下:
如果需要对全国区域数据进行获取,可以使用下面的方法:
单次数据爬取结果如下所示:
"12-09-22-15": {
"temperate": {
"day": {
"today_temperate": "1\u2103",
"now_temperate": "\u6c14\u6e29"
},
"three_hour": {
"12-10-11:00": "5.7\u2103",
"12-10-17:00": "5.7\u2103",
"12-10-05:00": "2.2\u2103",
"12-10-20:00": "5.5\u2103",
"12-09-23:00": "1.2\u2103",
"12-10-08:00": "3.4\u2103",
"12-10-02:00": "1.8\u2103",
"12-10-14:00": "5.8\u2103"
}
},
"wind_speed": {
"day": {
"today_winds": "3~4\u7ea7",
"now_winds": "\u98ce\u5411\u98ce\u901f"
},
"three_hour": {
"12-10-11:00": "1.3\u7c73/\u79d2",
"12-10-17:00": "1\u7c73/\u79d2",
"12-10-05:00": "1\u7c73/\u79d2",
"12-10-20:00": "2.1\u7c73/\u79d2",
"12-09-23:00": "0.7\u7c73/\u79d2",
"12-10-08:00": "0.8\u7c73/\u79d2",
"12-10-02:00": "0.3\u7c73/\u79d2",
"12-10-14:00": "1.4\u7c73/\u79d2"
}
},
"wind_direction": {
"day": {
"now_windd": "\u98ce\u5411\u98ce\u901f",
"today_windd": "\u65e0\u6301\u7eed\u98ce\u5411"
},
"three_hour": {
"12-10-11:00": "\u5317\u98ce",
"12-10-17:00": "\u897f\u5317\u98ce",
"12-10-05:00": "\u5317\u98ce",
"12-10-20:00": "\u897f\u5317\u98ce",
"12-09-23:00": "\u5317\u98ce",
"12-10-08:00": "\u5317\u98ce",
"12-10-02:00": "\u5317\u98ce",
"12-10-14:00": "\u897f\u5317\u98ce"
}
},
"humidity": {
"day": {
"today_humidity": "null",
"now_humidity": "\u76f8\u5bf9\u6e7f\u5ea6"
},
"three_hour": {
"12-10-11:00": "98%",
"12-10-17:00": "99.4%",
"12-10-05:00": "99.7%",
"12-10-20:00": "92.1%",
"12-09-23:00": "98.8%",
"12-10-08:00": "99.6%",
"12-10-02:00": "99.3%",
"12-10-14:00": "97.9%"
}
},
"water": {
"day": {
"now_water": 0,
"today_water": "null"
},
"three_hour": {
"12-10-11:00": "2.3",
"12-10-17:00": "2.5",
"12-10-05:00": "0.6",
"12-10-20:00": "2.8",
"12-09-23:00": "1.6",
"12-10-08:00": "0.5",
"12-10-02:00": "0.9",
"12-10-14:00": "2.5"
}
},
"pressure": {
"day": {
"today_pressure": "null",
"now_pressure": "null"
},
"three_hour": {
"12-10-11:00": "1016.6hPa",
"12-10-17:00": "1013.7hPa",
"12-10-05:00": "1016.5hPa",
"12-10-20:00": "1013.7hPa",
"12-09-23:00": "1018.9hPa",
"12-10-08:00": "1016.7hPa",
"12-10-02:00": "1017.3hPa",
"12-10-14:00": "1014.1hPa"
}
},
"weather": {
"day": {
"now_weather": "null",
"weather_png_link": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/night/7.png",
"today_weather": "\u5c0f\u96e8"
},
"three_hour": {
"12-10-11:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/7.png",
"12-10-17:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/7.png",
"12-10-05:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/7.png",
"12-10-20:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/7.png",
"12-09-23:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/6.png",
"12-10-08:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/7.png",
"12-10-02:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/6.png",
"12-10-14:00": "http://image.nmc.cn/static2/site/nmc/themes/basic/weather/white/day/7.png"
}
},
"cloud": {
"day": {
"now_cloud": "null",
"today_cloud": "null"
},
"three_hour": {
"12-10-11:00": "100%",
"12-10-17:00": "99.2%",
"12-10-05:00": "100%",
"12-10-20:00": "97.8%",
"12-09-23:00": "100%",
"12-10-08:00": "97.9%",
"12-10-02:00": "100%",
"12-10-14:00": "100%"
}
}
}
学习记录了!