因为最近需要将地址转成坐标,于是用python写了一个脚本用于读取文件中的地址并将其转成坐标。
首先需要明确你需要哪种坐标系的坐标,我这里需要2000的坐标系,因此选用的是天地图,其他地图也类似(高德地图,百度地图)。在天地图的开发者网站里可以查到对应的api。如下
在找到对应的api后我们就可以开始编码了,主要做了两个处理
1、读取数据,我们这里是用csv读取的地址信息,并将数据以换行为分割符切割成一个数组,代码如下
import io
f = io.open('address_1.csv','r',encoding='gbk') //创建文件流
data = f.read() //读取文件流中的数据,返回字符串
urlList = []
if(data.find('"') != -1):
data = data.replace('"','') //排除掉双引号引起的问题,根据各需求
datas=data.split('\n') //以换行为分隔符分割数据
f.close()
2、将字符串数组分割成属性,并根据每个地址,将请求的url给拼出来
urlList = []
if(data.find('"') != -1):
data = data.replace('"','')
datas=data.split('\n')
for index in range(len(datas)):
line = datas[index]
items = line.split(',')
index =items[0]
address=items[1]
fixedAddress = address
if(fixedAddress.find('"') != -1):
fixedAddress = fixedAddress.replace('"','')
urlList.append({
'index':index,
'address':address,
'url':'http://api.tianditu.gov.cn/geocoder?ds={"keyWord":"'+ fixedAddress +'"}&tk=6e31ea77aa6ab473089fb0c561ad3727'
})
3、将请求封装成一个方法:
def scrape(item):
try:
r = requests.get(item['url'], timeout=5)
if(r.status_code == 200):
print(item['index'])
result = r.json()
if(result['msg'] == 'ok'):
item['lng_wgs'] = result['location']['lon']
item['lat_wgs'] = result['location']['lat']
# return item
except ConnectionError:
print('Error Occured ', item)
except ReadTimeout:
print('Error Occured ', item)
finally:
return item
print('ok')
4、创建多线程处理
from multiprocessing import Pool
f __name__ == '__main__':
start = time.time()
pool = Pool(20)
urls = urlList
results = pool.map(scrape, urls)
csvfile = open('temp.csv','w')
spamwriter = csv.writer(csvfile, delimiter = ',',quotechar = ',', quoting = csv.QUOTE_MINIMAL)
print('over')
print(len(results))
for index in range(len(results)):
line = results[index]
if(index == 0):
spamwriter.writerow(['','count_address','lng_wgs','lat_wgs'])
else:
if(line):
spamwriter.writerow([line['index'],line['address'].encode(encoding='utf-8'),line['lng_wgs'],line['lat_wgs']])
else:
print('empty')
print(time.time()-start)