- 首先是在线转换的工具吧https://blog.csdn.net/pigs_dream/article/details/119118903
- 关于JSON是什么,以及数据格式是怎么样的,可以看这篇文章 https://www.jb51.net/article/77660.htm
其实,JSON和XML的功能是一样的,都是存储信息,只不过存储信息的方式有些不一样。同时JSON有好多不同的结构,用python解析起来,有好多种方式,网上一搜基本上是json.loads() 和dump这些,但是对于复杂的json格式,比如数组形式的json,json模块的功能可能不适用,这些功能对json格式的规范要求特别高。解析json格式之前最好在bejson这个网站看看自己的json文件是否符合规范。比如属性不能随意换行,比如一定要字段和对应的属性值一定要是双引号。
- 上面提到的JSON是普遍的JSON,那么在GIS中,也有特定的JSON,可以看这篇文章https://blog.csdn.net/gislaozhang/article/details/113616526
Esri特定的JSON和GeoJSON格式,这些格式的好处就是,它可以直接生成矢量数据,如果是Esri的标准JSON格式,可以在Arcgis中JSON TO Feature。如果是GeoJSON,可以通过GDAL转为矢量。但如果你的JSON不符合Esri特定的JSON和GeoJSON格式,那么你就不能直接转为矢量数据。 - 要怎么转为矢量数据呢?第一个,自己的JSON文件当中有没有经纬度信息,如果有,将JSON文件转换为csv文件,读取location信息,然后再通过经纬度生成点,这就有了矢量数据了。在JSON转csv的时候,我的json是数组形式的,并且经纬度信息还跨行(图1),在json格式中是不太规范的。数据格式具体长成这样:
[{
"page_url":"http://restapi.amap.com/v3/place/text?key=9ad8a68e24924e15dd48ef37003f5cf2&types=060300&city=341222&citylimit=true&children=1&offset=25&page=7&extensions=all&output=JSON",
"page_save_time":"2017-09-25 18:55:03",
"pcode":"340000",
"type":"购物服务;家电电子卖场;家电电子卖场",
"photos":[
],
"page_county":"341222",
"poiweight":"",
"typecode":"060300",
"page_fetch_time":"2017-08-13 11:59:44",
"adname":"太和县",
"citycode":"1558",
"children":[
],
"doc_class":2005.0,
"tel":"",
"id":"7#20170813#59ae5c7c39f9a67d0f544dd8f095f498",
"tag":"",
"entr_location":"",
"doc_item":20050101,
"page_size":19857.0,
"site_domain":"restapi.amap.com",
"adcode":"341222",
"pname":"安徽省",
"biz_type":"",
"cityname":"阜阳市",
"postcode":"",
"business_area":"",
"site_name":"高德地图",
"site_ip":"106.11.208.130",
"name":"海尔统帅电器体验中心",
"shopid":"",
"navi_poiid":"",
"page_city":"341200",
"task_name":"高德地图poi采集",
"distance":"",
"page_publish_time":"2017-08-13 11:59:44",
"doc_subclass":200501.0,
"biz_ext":{
"cost":"",
"rating":""
},
"importance":"",
"page_province":"340000",
"recommend":"0",
"task_id":"lbs.amap.com.poi",
"doc_type":20.0,
"discount_num":"0",
"gridcode":"4915641922",
"shopinfo":"0",
"task_group":"2017-08-10",
"alias":"",
"spider_ip":"192.168.21.83",
"event":"",
"indoor_map":"0",
"email":"",
"timestamp":"",
"website":"",
"address":"人民北路与光明路交叉口北150米",
"match":"0",
"indoor_data":{
"cmsid":"",
"truefloor":"",
"cpid":"",
"floor":""
},
"exit_location":"",
"location":"115.624885,
33.181075",
"groupbuy_num":"0"
}]
我使用了json中的load、loads功能,都解析失败了,我尝试了json to csv(第一个网址)的转换工具,居然成功了,我想应该是代码问题。于是我又进行了搜索:https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv
最后,通过pandas的read_json成功了!具体代码如下:
import pandas as pd
# file path
path = r"D:\05CBD\validation\dataverse_files\POIMap\POI2017.json"
#open file and select some columns
with open(path,encoding= 'utf-8') as fp:
df = pd.read_json(fp)
df = df[['typecode', 'citycode', 'cityname', 'location']]
#删除location里的多余空行和空格
df['location'] = df['location'].apply(lambda x:x.replace('\n', '').replace('\t', '').replace(' ',''))
#select_city = ['110000','430100','500000','210200','350100','440100','320100','310000','440300','120000','410100']
#筛选特定的行,copy很重要,这样返回的是副本,而不是视图
select_city = ['010','0731','023','0411','0591','020','025','021','0755','022','0371']
dftest = df.query('@select_city in citycode').copy()
#将经纬度信息变成两列(经度,纬度)
df1 = dftest['location'].str.split(',',expand = True)
#复制并且删除特定的列
dftest['lon'] = df1[0]
dftest['lat'] = df1[1]
dftest = dftest.drop(columns='location')
print('ok')