偶然间了解到之前的课题组老板提出了一个我认为既无聊又充满挑战的需求,无聊是因为我认为做出来的数据意义不大,充满挑战是因为我认为我现阶段是时候入门一下arcpy了。有某地区若干气象站30年每天的温度表格数据(最高温度、最低温度、平均温度),现在需要将每一天的数据进行插值(所以,得到的数据需要怎么分析?)。原始的数据很乱,我把原始的站点气象数据进行了整理,加上了经纬度以方便后续工作的开展,最终整理成下面的表格,总共47万多行,数据说大不大,说小不小:
对于我之前有用过GP工具切了解arcgis那一套数据结构的人来说可能比较容易的找到方法,首先我肯定是需要建立从1978年1月1日到2019年12月31日(算下来总共有一万四千多天)每天的空FeatureClass,然后遍历表找到当天站点的气象数据写入FeatureClass。为了方便计算,我在Excel中将日期转换成了整型数(python中肯定是可以直接用时间算的,但是不知道怎么做,原谅Python实习生一窍不通),最后进行插值计算(因为我的arcgis没有空间分析许可,这一步就没做,但方法都是相通的)。好,方法确定了,开撸!!!
首先,我需要指定工作(workspace),指定创建要素的空间参考(SpatialReference),这里我新建了一个gdb作为工作空间,用CSCS2000坐标系(4490)作为要素类的空间参考:
arcpy.env.workspace="E:\\TibetClimate\\TibetTemperature.gdb"
sr=arcpy.SpatialReference(4490)
4490是CGCS2000的WKID代号,空间参考的代号可以在这里查询:
接着,我需要调用arcgis创建FeatureClass和添加字段的接口来创建我想要的空FeatureClass,在gdb中不允许要素名数字开头,所以在前面加了一个下划线字符:
arcpy.CreateFeatureclass_management("E:\\TibetClimate\\TibetTemperature.gdb","_"+str(day),"POINT")
inFeature="_"+str(day)
arcpy.DefineProjection_management(inFeature,sr)
arcpy.AddField_management(inFeature,"date","TEXT","","",20,"日期","","","")
arcpy.AddField_management(inFeature,"site","TEXT","","",28,"站点名称","","","")
arcpy.AddField_management(inFeature,"siteID","TEXT","","",28,"站点编号","","","")
arcpy.AddField_management(inFeature,"lat","DOUBLE",5,10,"","经度","","","")
arcpy.AddField_management(inFeature,"lon","DOUBLE",5,10,"","纬度","","","")
arcpy.AddField_management(inFeature,"mean_temp","DOUBLE",2,10,"","平均温度","","","")
arcpy.AddField_management(inFeature,"max_temp","DOUBLE",2,10,"","最高温度","","","")
arcpy.AddField_management(inFeature,"min_temp","DOUBLE",2,10,"","最低温度","","","")
那么我怎么知道应该调用哪个接口呢?很简答,在arcgis的工具箱中把这个工具打开,点“help”就出来了,上面有python示例代码(这比在.net中调用gp工具方便太多,可以直接复制)。我可怜的thinkpad连续工作24小时后最后生成出来一万多个FeatureClass(这么多数据难道不觉得麻烦吗)。
写入数据那么我首先肯定要读取表格数据:
excel=xlrd.open_workbook("E:\\TibetClimate\\Tables\\joinTable.xlsx")
table=excel.sheets()[0]
然后我需要获取刚才创建的每一个FeatureClass,通过一次次的遍历找出每一天每一个站点的气象数据,写入属性表中:
rowCount=table.nrows
featureClassList= arcpy.ListFeatureClasses()
for fc in featureClassList:
with arcpy.da.InsertCursor(fc,('SHAPE@XY','date','site','siteID','lat','lon','mean_temp','max_temp','min_temp')) as cur:
for row in range(1,rowCount,1):
if int(table.cell(row,12).value)== int(fc.replace('_','')):
log=float(table._cell_values[row][10])
lat=float(table._cell_values[row][9])
date=str(int(table._cell_values[row][1]))+'/'+str(int(table._cell_values[row][2]))+'/'+str(int(table._cell_values[row][3]))
siteName=table._cell_values[row][8]
siteID=table._cell_values[row][7]
if table._cell_values[row][4]!='NULL':
minTemp=float(table._cell_values[row][4])/10
maxTemp=float(table._cell_values[row][5])/10
meanTemp=float(table._cell_values[row][6])/10
rowValue=[(log,lat),date,siteName,siteID,log,lat,-meanTemp,-maxTemp,minTemp]
cur.insertRow(rowValue)
因为原始数据的问题,有些数据是缺失的,所以需要加不为空的判断,“‘SHAPE@XY’”是必须有的,表示空间位置,感兴趣的可以试试,如果没有这个参数最后添加的数据只有属性表中一行,而不会在arcmap中显示。
最后,贴上实现的所有代码,因为只是为了得到一个结果,所以代码健壮性和可扩展性方面就没考虑太多,可以再加一些异常捕获的代码和输入参数的代码,这样就是一个完整的工具了。
# -*- coding:utf-8 -*-
import arcpy
import arcpy.da
import xlrd
excel=xlrd.open_workbook("E:\\TibetClimate\\Tables\\joinTable.xlsx")
table=excel.sheets()[0]
sr=arcpy.SpatialReference(4490)
#1、新建要素类,写入属性信息
arcpy.env.workspace="E:\\TibetClimate\\TibetTemperature.gdb"
#最大的时间范围
for day in range(42266,43830,1):
#1、新建要素类
arcpy.CreateFeatureclass_management("E:\\TibetClimate\\TibetTemperature.gdb","_"+str(day),"POINT")
inFeature="_"+str(day)
arcpy.DefineProjection_management(inFeature,sr)
arcpy.AddField_management(inFeature,"date","TEXT","","",20,"日期","","","")
arcpy.AddField_management(inFeature,"site","TEXT","","",28,"站点名称","","","")
arcpy.AddField_management(inFeature,"siteID","TEXT","","",28,"站点编号","","","")
arcpy.AddField_management(inFeature,"lat","DOUBLE",5,10,"","经度","","","")
arcpy.AddField_management(inFeature,"lon","DOUBLE",5,10,"","纬度","","","")
arcpy.AddField_management(inFeature,"mean_temp","DOUBLE",2,10,"","平均温度","","","")
arcpy.AddField_management(inFeature,"max_temp","DOUBLE",2,10,"","最高温度","","","")
arcpy.AddField_management(inFeature,"min_temp","DOUBLE",2,10,"","最低温度","","","")
#2、写入属性
rowCount=table.nrows
featureClassList= arcpy.ListFeatureClasses()
for fc in featureClassList:
with arcpy.da.InsertCursor(fc,('SHAPE@XY','date','site','siteID','lat','lon','mean_temp','max_temp','min_temp')) as cur:
for row in range(1,rowCount,1):
if int(table.cell(row,12).value)== int(fc.replace('_','')):
log=float(table._cell_values[row][10])
lat=float(table._cell_values[row][9])
date=str(int(table._cell_values[row][1]))+'/'+str(int(table._cell_values[row][2]))+'/'+str(int(table._cell_values[row][3]))
siteName=table._cell_values[row][8]
siteID=table._cell_values[row][7]
if table._cell_values[row][4]!='NULL':
minTemp=float(table._cell_values[row][4])/10
maxTemp=float(table._cell_values[row][5])/10
meanTemp=float(table._cell_values[row][6])/10
rowValue=[(log,lat),date,siteName,siteID,log,lat,-meanTemp,-maxTemp,minTemp]
cur.insertRow(rowValue)
第一次实践arcpy代码,有什么不正确的地方欢迎再评论区指出!