爬虫的代码如下所示,大家可以把谷歌的主页换成百度,修改一下爬虫的命令,就可以自己试试看这段代码。注意要先定义你需要爬取的城市名称的列表cityname哦。
import time
from selenium import webdriver #导入需要的模块,其中爬虫使用的是selenium
import pandas as pd
import numpy as np
#导入元数据
city = pd.read_csv(r'D:\whole_development_of_the_stack_study\RS_Algorithm_Course\Practical\machine_learning\08_09_10_11SVM_01\Cityclimate.csv')
city #第一列是城市名,第二列是气候。
City Climate
Adelaide Warm temperate
Albany Mild temperate
Albury Hot dry summer, cool winter
Wodonga Hot dry summer, cool winter
AliceSprings Hot dry summer, warm winter
Amata Hot dry summer, cool winter
Ballarat Cool temperate
Bathurst Cool temperate
Birdsville Hot dry summer, warm winter
Borroloola High humidity summer, warm winter
Bourke Hot dry summer, cool winter
Brisbane Warm humid summer, mild winter
BrokenHill Hot dry summer, cool winter
Broome High humidity summer, warm winter
Bunbury Warm temperate
Burketown High humidity summer, warm winter
Burra Mild temperate
Cairns High humidity summer, warm winter
Canberra Cool temperate
Carnarvon Hot dry summer, warm winter
Ceduna Warm temperate
Charleville Hot dry summer, warm winter
CooberPedy Hot dry summer, cool winter
Cooktown High humidity summer, warm winter
CoffsHarbour Warm humid summer, mild winter
Dampier High humidity summer, warm winter
Darwin High humidity summer, warm winter
Derby High humidity summer, warm winter
Devonport Cool temperate
Dubbo Hot dry summer, cool winter
Elliot Hot dry summer, warm winter
Esperance Warm temperate
Eucla Warm temperate
Exmouth High humidity summer, warm winter
GascoyneJunction Hot dry summer, warm winter
Geraldton Warm temperate
Goondiwindi Hot dry summer, warm winter
Griffith Hot dry summer, cool winter
HallsCreek Hot dry summer, warm winter
Hobart Cool temperate
Horsham Mild temperate
Innamincka Hot dry summer, cool winter
Ivanhoe Hot dry summer, cool winter
Kalgoorlie–Boulder Hot dry summer, cool winter
Katherine High humidity summer, warm winter
Kingscote Mild temperate
KingstonSE Mild temperate
Kulgera Hot dry summer, warm winter
LakesEntrance Mild temperate
Launceston Cool temperate
LeighCreek Warm temperate
Longreach Hot dry summer, warm winter
LordHoweIsland Warm temperate
Mackay Warm humid summer, mild winter
MargaretRiver Warm temperate
Maryborough Warm humid summer, mild winter
Melbourne Mild temperate
MelbourneAirport Mild temperate
Merredin Hot dry summer, cool winter
Mildura Hot dry summer, cool winter
MountGambier Mild temperate
MountIsa Hot dry summer, warm winter
Newcastle Warm temperate
Newdegate Hot dry summer, cool winter
Newman Hot dry summer, warm winter
Nhulunbuy High humidity summer, warm winter
Norseman Hot dry summer, cool winter
Nullarbor Hot dry summer, cool winter
Oenpelli High humidity summer, warm winter
Oodnadatta Hot dry summer, cool winter
Perth Warm temperate
PerthAirport Warm temperate
PortHedland High humidity summer, warm winter
PortLincoln Warm temperate
PortMacquarie Warm temperate
Renmark Warm temperate
Rockhampton Warm humid summer, mild winter
Shepparton Hot dry summer, cool winter
Southport Cool temperate
Strahan Cool temperate
Swansea Cool temperate
Sydney Warm temperate
SydneyAirport Warm temperate
Tamworth Hot dry summer, cool winter
Taroom Hot dry summer, warm winter
Telfer Hot dry summer, warm winter
TennantCreek Hot dry summer, warm winter
Thargomindah Hot dry summer, warm winter
Tibooburra Hot dry summer, cool winter
TimberCreek High humidity summer, warm winter
Townsville High humidity summer, warm winter
Warburton Hot dry summer, cool winter
Weipa High humidity summer, warm winter
Whyalla Hot dry summer, cool winter
Wiluna Hot dry summer, cool winter
Wollongong Warm temperate
Wyndham High humidity summer, warm winter
Yalgoo Hot dry summer, cool winter
Yulara Hot dry summer, warm winter
Uluru Hot dry summer, warm winter
#提取城市名
cityname = city.iloc[:,0]
cityname
#开始爬虫
df =pd.DataFrame(index = cityname.index) #创建新dataframe用于存储爬取的数据
driver = webdriver.Chrome() #调用谷歌浏览器
time0 = time.time() #计时开始
#循环开始
for num,city in enumerate(cityname): #在城市名称中进行遍历
#首先打开谷歌浏览器
driver.get('https://www.google.co.uk/webhp?hl=en&sa=X&ved=0ahUKEwimtcX24cTfAhUJE7wKHVkWB5AQPAgH')
#停留1秒让我们知道发生了什么
time.sleep(1)
#锁定谷歌的搜索框 ,因为谷歌搜索框的name="q"
search_box = driver.find_element_by_name('q')
#在输入框中输入“城市” 澳大利亚 经纬度
search_box.send_keys("%s Australia Latitude and longitude" % (city))
#enter,确认开始搜索
search_box.submit()
#爬取需要的经纬度
result = driver.find_element_by_xpath('//div[@class="Z0LcW"]').text
#将爬取的结果用split进行分割
resultsplit = result.split(" ")
#向提前创建好的df中输入爬取的数据,第一列是城市名
df.loc[num,"City"] = city
#第二列是纬度
df.loc[num,"Latitude"] = resultsplit[0]
#第三列是经度
df.loc[num,'Longitude'] = resultsplit[2]
#第四列是纬度的方向
df.loc[num,"Latitudedir"] = resultsplit[1]
#第五列是经度的方向
df.loc[num,"Longitudedir"] = resultsplit[3]
#每次爬虫成功之后,就打印“城市”成功了
print("%i webcrawler successful for city %s" %(num, city))
time.sleep(1) #全部爬取完毕后,停留1秒钟
driver.quit() #关闭浏览器
print(time.time()-time0) #打印所需的时间
df #所有城市的经纬度
#保存
df.to_csv(r"D:\whole_development_of_the_stack_study\RS_Algorithm_Course\Practical\machine_learning\08_09_10_11SVM_01\csv\cityll.csv")
机器学习-Sklearn-11(支持向量机SVM-SVC真实数据案例:预测明天是否会下雨)中爬取数据模块内容https://blog.csdn.net/m0_37755995/article/details/123944354