矢量转栅格一直是GIS领域的一个重要的问题,对于分布式计算来说,栅格数据较矢量数据更加优化,查询、分析起来也更快。于是我们考虑可以将全国的地表覆盖数据全部栅格化之后来进行分析。
那就先试一下最简单的栅格化好惹 生成一张单波段的栅格图像,同时栅格的值表示地表覆盖数据中的分类码。
我先在geotrellis的系列文章中翻了一下,发现还真有一篇矢量栅格化的文章:
https://www.cnblogs.com/shoufengwei/p/5619419.html
于是我先照着文章里的做法试了一下:
为了方便测试,我在arcgis中随便画了四个polygon,坐标是4326
1.首先将矢量数据读进来
读取矢量数据首先需要引入一些geotools的包,在build.sbt里加一些依赖:
resolvers := Seq(
"Typesafe Releases" at "http://repo.typesafe.com/typesafe/maven-releases/",
"Unidata Repository" at "https://artifacts.unidata.ucar.edu/content/repositories/unidata-releases",
MavenRepository("geotools","http://download.osgeo.org/webdav/geotools"),
"nscala-time" at "http://mvnrepository.com/artifact/com.github.nscala-time/nscala-time_2.10"
)
libraryDependencies += "com.vividsolutions" % "jts" % "1.13"
libraryDependencies += "org.geotools" % "gt-main" % "14.1"
libraryDependencies += "org.geotools" % "gt-shapefile" % "10.2"
上面那一段resolvers表示geotools的包可以去maven的仓库里下载
更新完所有的包之后,就可以读取矢量文件啦~ 这里提供两种方法(其实是同一种方法→。→)
第一种方法:方法一是博客里的方法,直接把geotrellis读取shp的代码搬过来,最后得到的是一个Seq[Geometry]:
代码如下:
def getFeatures(path: String, attrName: String = "the_geom", charset: String = "UTF-8"): mutable.ListBuffer[Geometry] ={
val features = mutable.ListBuffer[Geometry]()
var polygon: Option[MultiPolygon] = null
val shpDataStore = new ShapefileDataStore(new File(path).toURI().toURL())
shpDataStore.setCharset(Charset.forName(charset))
val typeName = shpDataStore.getTypeNames()(0)
val featureSource = shpDataStore.getFeatureSource(typeName)
val result = featureSource.getFeatures()
val itertor = result.features()
while (itertor.hasNext()) {
val feature = itertor.next()
val p = feature.getProperties()
val it = p.iterator()
while (it.hasNext()) {
val pro = it.next()
if (pro.getName.getLocalPart.equals(attrName)) {
features += WKT.read(pro.getValue.toString) //get all geom from shp
}
}
}
itertor.close()
shpDataStore.dispose()
features
}
第二种方法:直接调用ShapeFileReader中readSimpleFeature的方法,最后得到一个Feature类型的Listbuffer:
代码如下:
val shpPath = "D:\\IdeaProjects\\ScalaDemo\\data\\shapefile\\shp2raster2.shp";
val features = readSimpleFeatures(shpPath)
原文中作者一直报错,但是我这里没有
2.读取完矢量数据之后,就可以开始栅格化啦~
我先尝试了一下文章里的方法,首先要获得一个RasterExtent,然后直接调用rasterizeWithValue就可以了。
val re =RasterExtent(extent, 1200, 600)
val tile=Rasterizer.rasterizeWithValue(features, re, 100)
tile.renderPng(colorMap1).write("D:\\IdeaProjects\\ScalaDemo\\data\\shp2raster2\\result4.tif")
栅格化的结果是这样的:(um....好像一只眼睛??)
um.....这样就结束了?显然不是很对。如果我处理的是大批量的矢量数据怎么办?怎么用Spark来进行栅格化呢?
通过构建一个Geometry的RDD,进行栅格化,最后再合并起来。这样感觉好麻烦!于是我google了一下,结果看到官方文档里介绍geotrellis1.2已经支持将Geometry类型的RDD进行栅格化,嗨呀,这就是我想要的呀~!!
开搞~~
首先要升级我的依赖包。然而我并不知道具体要升级哪一个包,于是就在sbt里把所有的包都升级成1.2了,但是geoms.rasterize这一行还是报错
于是我引入了geotrellis.spark._这个包,发现可以了。
另外要注意的是,由于源码提供的接口对应的是RDD[Geometry],上面读取栅格数据的时候最好是采用第一种方法。
然后就照着官方的demo依样画葫芦。。
具体代码:
val conf = new SparkConf().setMaster("local").setAppName("Shp2Raster")
val sc = new SparkContext(conf)
val shpPath = "D:\\IdeaProjects\\ScalaDemo\\data\\shapefile\\shp2raster2.shp";
val features =getFeatures(shpPath)
val featureRDD :RDD[Geometry]= sc.parallelize(features)
val extent:Extent = Extent(80, 15, 140, 40)
val tl = TileLayout(100, 72, 5, 5)
val layout = LayoutDefinition(extent, tl)
val celltype:CellType=IntCellType
val layerRDD: RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = featureRDD.rasterize(35, celltype, layout)
val layerResult = layerRDD.collect();
for(sublayer<-layerResult) {
sublayer._2.renderPng(colorMap1).write("D:\\IdeaProjects\\ScalaDemo\\data\\test\\"+sublayer._1+".tif")
}
val stitched = TileLayoutStitcher.stitch(layerResult)._1
stitched.renderPng(colorMap1).write("D:\\IdeaProjects\\ScalaDemo\\data\\test\\result.tif")
但是这里有一个问题是栅格化出来的栅格的值是固定的,这是因为geoms是一个存储几何形状的RDD,并不包含属性信息。
考虑到geometry不带字段,但是SimpleFeature(Feature)带啊,于是我想到可以构建一个RDD[Feature],然后把后面栅格化的代码重写一下就好了,重写的代码如下:
def rasterizeFeature[G <: SimpleFeature](
geoms: RDD[G],
cellType: CellType,
layout: LayoutDefinition,
options: Rasterizer.Options = Rasterizer.Options.DEFAULT,
partitioner: Option[Partitioner] = None
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
val features = geoms.map({ g => Feature(g.getDefaultGeometry.asInstanceOf[Geometry],g.getAttribute("CC").toString.toDouble) })
fromFeature(features, cellType, layout, options, partitioner)
}
写好之后本来满心欢喜的以为能跑了,结果报错,才发现Feature并不能序列化。。感到气气
不能用Feature,那就还是只能用Geometry,为了让他带上字段信息,我想到可以把RDD的类型改成一个Tuple2
def rasterizeFeature( geoms: RDD[(Geometry,Double)],
cellType: CellType,
layout: LayoutDefinition,
options: Rasterizer.Options = Rasterizer.Options.DEFAULT,
partitioner: Option[Partitioner] = None
): RDD[(SpatialKey, Tile)] with Metadata[LayoutDefinition] = {
val features = geoms.map({ g => Feature(g._1,g._2) })
fromFeature(features, cellType, layout, options, partitioner)
成功了~最后把生成的图片拼成一张大图:(丑丑的配色)
在Spark环境下的矢转栅就完成啦~~
矢量栅格化中还有一个问题是边界取值问题,geotrellis的边界取值策略我不太了解,于是先去找了一些文献看了一下目前的边界取值策略有哪些:
然后我就去扒了一下geotrellis的取值策略,geom.feoreach函数是可以计算出每个矢量对应的格网,这其中包含了所有的相交操作。注意到转栅格的时候有一个Option选项
feature.geom.foreach(re, options)
@param options Rasterizer options for cell intersection rules
这样看这个option好像是能决定矢量与像素是否相交的策略?点到这个option里面看:
object Rasterizer {
/**
* A type encoding rasterizer options.
*/
case class Options(
includePartial: Boolean,
sampleType: PixelSampleType
)
/**
* A companion object for the [[Options]] type. Includes a
* function to produce the default options settings.
*/
object Options {
def DEFAULT = Options(includePartial = true, sampleType = PixelIsPoint)
}
可以看到这个option里面主要就是两个选项,一个是includePartial,另一个是sampleType,然而并不太懂这两个具体是什么含义。再继续往下扒,看到这样一条注释:
* @param options The options parameter controls whether to treat pixels as points or areas and whether to report partially-intersected areas.
/**
* This function causes the function f to be called on each pixel
* that interacts with the polygon. The definition of the word
* "interacts" is controlled by the options parameter.
*
* @param poly A polygon to rasterize
* @param re A raster extent to rasterize the polygon into
* @param options The options parameter controls whether to treat pixels as points or areas and whether to report partially-intersected areas.
*/
def foreachCellByPolygon(
poly: Polygon,
re: RasterExtent,
options: Options = Options.DEFAULT
)(f: Callback): Unit = {
val sampleType = options.sampleType
val partial = options.includePartial
val edges = polygonToEdges(poly, re)
var y = 0
while(y < re.rows) {
val rowRuns =
if (sampleType == PixelIsPoint) runsPoint(edges, y, re.cols)
else runsArea(edges, y, re.cols, partial)
var i = 0
while (i < rowRuns.length) {
var x = max(rowRuns(i).toInt, 0)
val stop = min(rowRuns(i+1).toInt, re.cols)
while (x < stop) {
f(x, y)
x += 1
} // x loop
i += 2
} // i loop
y += 1
} // y loop
}
if (sampleType == PixelIsPoint) runsPoint(edges, y, re.cols)
else runsArea(edges, y, re.cols, partial)
var i = 0
while (i < rowRuns.length) {
var x = max(rowRuns(i).toInt, 0)
val stop = min(rowRuns(i+1).toInt, re.cols)
while (x < stop) {
f(x, y)
x += 1
} // x loop
i += 2
} // i loop
y += 1
} // y loop
}
标红的这两行代码就是关键的代码啦~
当sampleType为PixelIsPoint的时候,相当于把每个像素格子看做是点,然后用扫描线法与矢量要素进行相交运算,所以我的理解是geotrellis的边界取值算法是中心点法?
当includePartial=true的时候,
* @param partial True if all intersected cells are to be reported, otherwise only those on the interior of the polygon
吼,到这里我对geotrellis的栅格化有一些自己的理解啦~~下一步就是要试试看能不能栅格化成多波段的数据。