JSON文件与处理

  • 首先是在线转换的工具吧https://blog.csdn.net/pigs_dream/article/details/119118903
  • 关于JSON是什么,以及数据格式是怎么样的,可以看这篇文章 https://www.jb51.net/article/77660.htm
    其实,JSON和XML的功能是一样的,都是存储信息,只不过存储信息的方式有些不一样。同时JSON有好多不同的结构,用python解析起来,有好多种方式,网上一搜基本上是json.loads() 和dump这些,但是对于复杂的json格式,比如数组形式的json,json模块的功能可能不适用,这些功能对json格式的规范要求特别高。解析json格式之前最好在bejson这个网站看看自己的json文件是否符合规范。比如属性不能随意换行,比如一定要字段和对应的属性值一定要是双引号。
    图1
  • 上面提到的JSON是普遍的JSON,那么在GIS中,也有特定的JSON,可以看这篇文章https://blog.csdn.net/gislaozhang/article/details/113616526
    Esri特定的JSON和GeoJSON格式,这些格式的好处就是,它可以直接生成矢量数据,如果是Esri的标准JSON格式,可以在Arcgis中JSON TO Feature。如果是GeoJSON,可以通过GDAL转为矢量。但如果你的JSON不符合Esri特定的JSON和GeoJSON格式,那么你就不能直接转为矢量数据。
  • 要怎么转为矢量数据呢?第一个,自己的JSON文件当中有没有经纬度信息,如果有,将JSON文件转换为csv文件,读取location信息,然后再通过经纬度生成点,这就有了矢量数据了。在JSON转csv的时候,我的json是数组形式的,并且经纬度信息还跨行(图1),在json格式中是不太规范的。数据格式具体长成这样:
[{
    "page_url":"http://restapi.amap.com/v3/place/text?key=9ad8a68e24924e15dd48ef37003f5cf2&types=060300&city=341222&citylimit=true&children=1&offset=25&page=7&extensions=all&output=JSON",
    "page_save_time":"2017-09-25 18:55:03",
    "pcode":"340000",
    "type":"购物服务;家电电子卖场;家电电子卖场",
    "photos":[
        
    ],
    "page_county":"341222",
    "poiweight":"",
    "typecode":"060300",
    "page_fetch_time":"2017-08-13 11:59:44",
    "adname":"太和县",
    "citycode":"1558",
    "children":[
        
    ],
    "doc_class":2005.0,
    "tel":"",
    "id":"7#20170813#59ae5c7c39f9a67d0f544dd8f095f498",
    "tag":"",
    "entr_location":"",
    "doc_item":20050101,
    "page_size":19857.0,
    "site_domain":"restapi.amap.com",
    "adcode":"341222",
    "pname":"安徽省",
    "biz_type":"",
    "cityname":"阜阳市",
    "postcode":"",
    "business_area":"",
    "site_name":"高德地图",
    "site_ip":"106.11.208.130",
    "name":"海尔统帅电器体验中心",
    "shopid":"",
    "navi_poiid":"",
    "page_city":"341200",
    "task_name":"高德地图poi采集",
    "distance":"",
    "page_publish_time":"2017-08-13 11:59:44",
    "doc_subclass":200501.0,
    "biz_ext":{
        "cost":"",
        "rating":""
    },
    "importance":"",
    "page_province":"340000",
    "recommend":"0",
    "task_id":"lbs.amap.com.poi",
    "doc_type":20.0,
    "discount_num":"0",
    "gridcode":"4915641922",
    "shopinfo":"0",
    "task_group":"2017-08-10",
    "alias":"",
    "spider_ip":"192.168.21.83",
    "event":"",
    "indoor_map":"0",
    "email":"",
    "timestamp":"",
    "website":"",
    "address":"人民北路与光明路交叉口北150米",
    "match":"0",
    "indoor_data":{
        "cmsid":"",
        "truefloor":"",
        "cpid":"",
        "floor":""
    },
    "exit_location":"",
    "location":"115.624885,
        33.181075",
    "groupbuy_num":"0"
}]

我使用了json中的load、loads功能,都解析失败了,我尝试了json to csv(第一个网址)的转换工具,居然成功了,我想应该是代码问题。于是我又进行了搜索:https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv
最后,通过pandas的read_json成功了!具体代码如下:

import pandas as pd

# file path
path = r"D:\05CBD\validation\dataverse_files\POIMap\POI2017.json"

#open file and select some columns
with open(path,encoding= 'utf-8') as fp:
    df = pd.read_json(fp)
    df = df[['typecode', 'citycode', 'cityname', 'location']]

#删除location里的多余空行和空格
df['location'] = df['location'].apply(lambda x:x.replace('\n', '').replace('\t', '').replace(' ',''))
#select_city = ['110000','430100','500000','210200','350100','440100','320100','310000','440300','120000','410100'] 

#筛选特定的行,copy很重要,这样返回的是副本,而不是视图
select_city = ['010','0731','023','0411','0591','020','025','021','0755','022','0371']   
dftest = df.query('@select_city in citycode').copy()

#将经纬度信息变成两列(经度,纬度)
df1 = dftest['location'].str.split(',',expand = True)

#复制并且删除特定的列
dftest['lon'] = df1[0]
dftest['lat'] = df1[1]
dftest = dftest.drop(columns='location')

print('ok')

你可能感兴趣的:(JSON文件与处理)