在transbigdata中,栅格参数有如下几个
params=(lonStart,latStart,deltaLon,deltaLat,theta)
如何选择合适的栅格参数是很重要的事情,这会对最终的分析结果产生很大的影响。
怎么选择参数,和数据以及分析的目的息息相关,transbigdata提供了三种方法来优化
import pandas as pd
import geopandas as gpd
import transbigdata as tbd
data=pd.read_csv('Downloads/TaxiData-Sample.csv',names= ['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus', 'Speed'])
data
area = gpd.read_file('Downloads/szarea1.json')
area
area.plot()
data=tbd.clean_outofshape(data,area)
data
grid,initialparams=tbd.area_to_grid(area)
initialparams
'''
{'slon': 113.87256817484639,
'slat': 22.55155183165019,
'deltalon': 0.004869410314514816,
'deltalat': 0.004496605206422906,
'theta': 0,
'method': 'rect',
'gridsize': 500}
'''
grid.plot()
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='centerdist',
sample=0, #not sampling
printlog=True)
'''
Optimized index centerdist: 167.56608905526596
Optimized gridding params: {'slon': 113.87374968010685, 'slat': 22.553664777307173, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 44.131419745260644, 'method': 'rect'}
'''
基尼指数的较高值表示数据在给定栅格中的分布更集中
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='gini',
sample=0, #not sampling
printlog=True)
'''
Optimized index gini: -0.07232170907948476
Optimized gridding params: {'slon': 113.87460338641485, 'slat': 22.554558793623986, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 45.108548092477754, 'method': 'rect'}
'''
可以看到最左上的斜条,这边是4个,而之前centerdist的是3个
每个人都应出现在尽可能少的栅格中
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='gridscount',
sample=0, #not sampling
printlog=True)
'''
Optimized index gridscount: 9.0
Optimized gridding params: {'slon': 113.87506228430335, 'slat': 22.55319001399235, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 40.90195581089501, 'method': 'rect'}
'''
Optimized index gridscount: 13.0
Optimized gridding params: {'slon': 113.87524021241177, 'slat': 22.55350036557914, 'deltalon': 0.004869410314514816, 'deltalat': 0.004496605206422906, 'theta': 23.722083498424578, 'method': 'tri'}
initialparams['method']='hexa'
params_op=tbd.grid_params_optimize(data,
initialparams,
col=['VehicleNum','Lng','Lat'],
optmethod='gridscount',
sample=0, #not sampling
printlog=True)