第一篇记录代码blog在过年前最后一个工作日上传——获取历史天气数据


最近准备尝试DL,发现测试数据都是别人用好的,有点无聊,自己翻了下网写了个python的下载 任一地方任一时间段天气数据的代码,该网页是国外的数据比较全,鄙视国内的LJ,数据都不公开。(更新城市code,只有主要城市的历史数据,但实时数据则很多城市都有)。大家可以不用python,用其他语言写的我就只给出链接地址:http://en.tutiempo.net/climate/01-1980/ws-594930.html

另给出城市code,即ws后数字的查找表,只有主要城市的citycode,人家就只是给出这些没有办法,这2013年更新的,2016更新的数据要几百刀:http://download.csdn.net/detail/kevinsun1717/9746374

from __future__ import print_function
import urllib2

#best get weather20170123
from bs4 import BeautifulSoup


year = 2016	
cityCode='592780'#'guangzhou 592870'    #shenzhen 594930 #zhaoqing/gaoyao 592780 #qingyuan 542590

strFile = cityCode+'gzWeather'  + '.csv'
f = open(strFile, 'w')
try:
    while (year>=1980):
    
        
        for month in range(1, 13)[::-1]:
            
            if(month < 10):
                strMonth = '0' + str(month)
            else:
                strMonth = str(month)
            strYear = str(year) 
            #print( strYear + "\nGetting data for month" + +strMonth + "...", end='')
            url="http://en.tutiempo.net/climate/"+strMonth+"-"+strYear+"/ws-"+cityCode+".html"        
            page = urllib2.urlopen(url)
            soup = BeautifulSoup(page, "html.parser")
            
            weatherSet = soup.find(attrs={"class":"medias mensuales"})
            
            #tds = weatherSet.find_all('td')
            #print(tds[466]) #locate tc2
            #print(tds[476]) #RA	Indica whether there was rain or drizzle (In the monthly average, the total days it rained)
            #print(tds[479]) #FG	Indicates whether there was fog (In the monthly average, Total days with fog)
            #print(weatherSet.span)
            
            if( weatherSet.find(attrs={"class":"tc2"}) == None):
                print("fail to get the page or web list error", end='')             
                if month==1: 
                    year=0
                    break
                  
                else:
                    continue
            print("\nGetting data for " + strYear + "-"+strMonth, end='')
            #fill day by index or by find(attrs={"class":"medias mensuales"})
            avgTemperature = weatherSet.find(attrs={"class":"tc2"}).text
            maxTemperature = weatherSet.find(attrs={"class":"tc3"}).text
            minTemperature  = weatherSet.find(attrs={"class":"tc4"}).text
            #slp=tds[469].text #null
            humidity= weatherSet.find(attrs={"class":"tc6"}).text
            rainfall= weatherSet.find(attrs={"class":"tc7"}).text
            avrvisible= weatherSet.find(attrs={"class":"tc8"}).text
            windspeed= weatherSet.find(attrs={"class":"tc9"}).text
            rainday=weatherSet.find_all('td')[-4].text
            stormday =weatherSet.find_all('td')[-2].text
            fogday=weatherSet.find_all('td')[-1].text
            f.write(strYear +',' + strMonth +',' +avgTemperature +',' + maxTemperature+','+minTemperature + ','+humidity+','+rainfall+','+avrvisible+','+windspeed+','+rainday+','+stormday+','+fogday+'\n')
            #
            
            
            print("...done", end='')
    
        year=year-1
except Exception as err:  
    print(err)         
        
f.close()
print ("\nover")


你可能感兴趣的:(python)