高德POI数据采集笔记

一、业务需求

采集某个城市的各种类型的POI数据,优化交通,探索了很长时间,查阅了很多资料,梳理一下思路做一下笔记。

二、采集思路

利用高德开发平台的搜索POI功能,我这里用到的是多边形搜索功能,即给定一个经纬度点构成的多边形,搜索多边形内的api

经过探索发现,多边形不能画的太大,返回的数据条数是有限制的,考虑到这个原因,进入如下方式的优化:

1、首先获取城市的行政区域边界

https://restapi.amap.com/v3/config/district?parameters

下载城市编码表

https://lbs.amap.com/api/webservice/download

根基城市编码表,输入不同行政区域请求上述的url,我是用下面的url进行请求的,替换申请的key和keywords参数,就可以获取指定行政区域的边界

https://restapi.amap.com/v3/config/district?platform=JS&key=&subdistrict=0&extensions=all&level=district&s=rsv3&output=json&keywords=天津市

2、对每个行政区域边界求最大最小经纬度,得到行政区域的矩形边界

3、对每个矩形边界按照100*100距离进行网格化

我实现的比较繁琐,方法比较笨,直接每个方向进行遍历的

def getBound(p1, p2):
    lng1 = (p1.split(',')[0]) + ","
    lat1 = (p1.split(',')[1]) + ";"
    lng2 = (p2.split(',')[0]) + ","
    lat2 = (p2.split(',')[1]) + ";"
    b1 = lng1 + lat1
    b2 = lng2 + lat1
    b3 = lng2 + lat2
    b4 = lng1 + lat2
    return b1 + b2 + b3 + b4 + b1

from utils import gpsutils

def getAllRec(leftUp, rightDown):
    nodeList = []
    recList = [];
    leftlng = float(leftUp.split(',')[0])
    leftlat = float(leftUp.split(',')[1])
    rightlng = float(rightDown.split(',')[0])
    rightlat = float(rightDown.split(',')[1])
    nodeList.append([leftlng, leftlat, rightlng, rightlat])
    recList.append([leftlng, leftlat, rightlng, rightlat])
    step_lat = 0.008
    step_lng = 0.008
    result_list = []
    while len(recList) > 0:
        item = recList.pop();
        # print(getBound(str(item[0]) + "," + str(item[1]), str(item[2]) + "," + str(item[3])))
        leftlng = item[0]
        leftlat = item[1]
        rightlng = item[2]
        rightlat = item[3]

        if item[2] - item[0] > step_lng and item[1] - item[3] > step_lat:
            midLng = round((item[0] + item[2]) / 2, 6)
            midLat = round((item[1] + item[3]) / 2, 6)
            nodeList.append([leftlng, leftlat, midLng, midLat])
            nodeList.append([leftlng, midLat, midLng, rightlat])
            nodeList.append([midLng, leftlat, rightlng, midLat])
            nodeList.append([midLng, midLat, rightlng, rightlat])
            recList.append([leftlng, leftlat, midLng, midLat])
            recList.append([leftlng, midLat, midLng, rightlat])
            recList.append([midLng, leftlat, rightlng, midLat])
            recList.append([midLng, midLat, rightlng, rightlat])
        else:
            print(gpsutils.calcDistance([item[1],item[0]],[item[3],item[2]]))
            result_list.append(item)
    return result_list


list = getAllRec("118.09338700,24.52395700", "118.17166400,24.44209500")
for item in list:
    print(getBound(str(item[0]) + "," + str(item[1]), str(item[2]) + "," + str(item[3])))

4、下载POI类型编码表

https://lbs.amap.com/api/webservice/download

5、遍历每个编码类型中的大类

6、遍历每个网络,对每个网络多边形搜索指定大类的POI

https://restapi.amap.com/v3/place/polygon?parameters

7、对返回结果进行解析,获取POI的各种属性

keywords这里没用到,可以去掉,poi_type表示类型,leftup,rightdown表示矩形边界点,具体实现时,采用的根据返回的POI个数是否大于200,如果大于200,则继续对网格进行切分,再次请求。

def requestPOIApiByPolygon(keyWords, key, poi_type, leftUp, rightDown):
    page_size = 25
    poi_list = []
    all_data = []  # 所有的POI数据
    recList = [];  # 矩形的栈
    leftlng = float(leftUp.split(',')[0])
    leftlat = float(leftUp.split(',')[1])
    rightlng = float(rightDown.split(',')[0])
    rightlat = float(rightDown.split(',')[1])
    recList.append([leftlng, leftlat, rightlng, rightlat])
    while len(recList) > 0:
        item = recList.pop();
        # print(getBound(str(item[0]) + "," + str(item[1]), str(item[2]) + "," + str(item[3])))
        leftlng = item[0]
        leftlat = item[1]
        rightlng = item[2]
        rightlat = item[3]
        polygon = str(leftlng) + "," + str(leftlat) + "|" + str(rightlng) + "," + str(rightlat)
        if getPOIApiCountByPolygon(keyWords, key, poi_type, polygon) > 200:
            midLng = round((item[0] + item[2]) / 2, 6)
            midLat = round((item[1] + item[3]) / 2, 6)
            recList.append([leftlng, leftlat, midLng, midLat])
            recList.append([leftlng, midLat, midLng, rightlat])
            recList.append([midLng, leftlat, rightlng, midLat])
            recList.append([midLng, midLat, rightlng, rightlat])
        else:
            for pageNum in range(page_size):
                URL = "https://restapi.amap.com/v3/place/polygon?keywords=" + keyWords + \
                      "&city=" + "xiamen" + \
                      "&output=json" + \
                      "&types=" + str(poi_type) + \
                      "&key=" + key + \
                      "&polygon=" + polygon + \
                      "&offset=" + str(page_size) + \
                      "&page=" + str(pageNum + 1)
                resp = requests.get(URL)
                res = json.loads(resp.text)
                if str(res['info']) == 'USER_DAILY_QUERY_OVER_LIMIT':
                    print("请求次数超过限制")
                    return;
                if pageNum == 0 and int(str(res['count'])) > 0:
                    print("area#%s#count=%s" % (
                        getBound(str(leftlng) + "," + str(leftlat), str(rightlng) + "," + str(rightlat)), res['count']))
                if int(str(res['count'])) > 0:
                    # print(URL)
                    for r in res['pois']:
                        poi = {}
                        poi['name'] = r['name']
                        poi['id'] = r['id']
                        if poi['id'] in poi_id_list_have:
                            continue
                        poi['location'] = r['location'] + r";"
                        poi['type'] = r['type']
                        poi['pname'] = r['pname']
                        poi['cityname'] = r['cityname']
                        if poi['cityname'] != '厦门市':
                            continue
                        poi['adname'] = r['adname']
                        poi['address'] = r['address']
                        poi['rec'] = polygon
                        if len(poi_list) == 0 or poi['id'] not in pd.DataFrame(poi_list).iloc[:, 1]:
                            poi_list.append(list(poi.values()))  # 把dict的values 转成list 添加到list列表
                        # else:
                        #     print("id=%s 已经存在" % (poi['id']))
                    pageNum += 1
                    time.sleep(random.randint(30, 60))
                else:
                    break
            # print("当前poi个数:%d" % (len(poi_list)))
    if len(poi_list) > 0:
        all_data = pd.DataFrame(poi_list)
    return all_data

7、高德搜索POI获取到的POI的边界点

高德直接返回的没有边界点,但是在高德地图上有的能获取到边界点,经过分析之后发现,是根据POI的ID进行的请求

https://www.amap.com/detail/get/detail?id={}

但是直接请求,发现会给高德限制,无法批量获取,经过调研发现可以利用百度的边界点获取功能,获取到边界点。

具体做法:

  • 遍历高德POI的名称,根据名称请求百度的url,解析返回结果

    https://map.baidu.com/?newmap=1&reqflag=pcmap&biz=1&from=webmap&da_par=direct&pcevaname=pc4.1&qt=s&da_src=shareurl&wd=西雅图
    
  • 百度返回的结果的坐标是墨卡托坐标,需要对墨卡托坐标进行转换,转换方法参考
    百度墨卡托坐标转化

你可能感兴趣的:(工作经验,笔记,java,python)