(一)Geospark入门

GeoSpark是基于Spark分布式的地理信息计算引擎,相比于传统的ArcGIS,GeoSpark可以提供更好性能的空间分析、查询服务。

准备工作

  1. Ubuntu18.04
  2. IDEA
  3. GeoSpark支持Java、Scala两种,本次开发语言选择Java。

JDK8安装

  1. 下载JDK8:https://download.oracle.com/otn/java/jdk/8u211-b12/478a62b7d4e34b78b671c754eaaf38ab/jdk-8u211-linux-x64.tar.gz (注:现在需要注册Oracle账户才允许下载)

  2. 下载解压后,复制到/opt下面,然后在~/.bashrc下面添加环境变量

    export JAVA_HOME=/opt/jdk1.8.0_172 #这里改成你的jdk目录名
    export PATH=${JAVA_HOME}/bin:$PATH
    export CLASSPAHT=.:/opt/jdk1.8.0_172/lib:/opt/jdk1.8.0_172/lib/dt.jar:/opt/jdk1.8.0_172/lib/tools.jar #在JDK8后应该是不需要在配置CLASSPATH,这里为了保险起见,还是加上了
    

Scala配置

  1. 下载Scala2.12.8:https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz

  2. 下载解压后,复制到/opt下面,然后在~/.bashrc下面添加环境变量

    export SCALA_HOME=/opt/scala-2.12.8
    export PATH=${SCALA_HOME}/bin:$PATH
    
  3. 然后执行source ~/.bashrc

  4. 执行scala -version,如果出现有类似以下信息,则表示安装成功

    Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
    

Spark单机配置

  1. 这里配置的是单机版Spark,不需要集群,不需要部署Hadoop等环境.

  2. 下载Spark2.4.3: https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.6.tgz

  3. 下载解压后,复制到用户目录下面/home/{user},然后在~/.bashrc下面添加环境变量:

    export SPARK_HOME=/home/hwang/spark-2.4.3-bin-hadoop2.6
    export SPARK_LOCAL_IP="127.0.0.1"
    export PATH=${SPARK_HOME}/bin:$PATH
    
  4. 然后执行spark-shell,如果出现以下信息则表示安装成功

    Spark context Web UI available at http://localhost:4040
    Spark context available as 'sc' (master = local[*], app id = local-1559006613213).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.4.3
          /_/
             
    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172)
    scala> 
    

GeoSpark

  1. 打开IDEA,创建Maven新工程,修改pom.xml文件

       
            2.11
            1.2.0
            2.3
            2.4.3
            2.7.2
        
    
        
            
                org.scala-lang
                scala-library
                2.11.0
            
            
                org.datasyslab
                geospark
                ${geospark.version}
            
            
                org.datasyslab
                geospark-sql_${spark.compatible.verison}
                ${geospark.version}
            
            
                org.datasyslab
                geospark-viz_${spark.compatible.verison}
                ${geospark.version}
            
            
                org.datasyslab
                sernetcdf
                0.1.0
            
            
                org.apache.spark
                spark-core_${scala.version}
                ${spark.version}
                ${dependency.scope}
                
                    
                        org.apache.hadoop
                        *
                    
                
            
            
                org.apache.spark
                spark-sql_${scala.version}
                ${spark.version}
                ${dependency.scope}
            
            
                org.apache.hadoop
                hadoop-mapreduce-client-core
                ${hadoop.version}
                ${dependency.scope}
            
            
                org.apache.hadoop
                hadoop-common
                ${hadoop.version}
                ${dependency.scope}
            
        
     
            
                
                    org.apache.maven.plugins
                    maven-compiler-plugin
                    3.8.0
                    
                        1.8
                        1.8
                    
                
            
        
    
  1. 我们从CSV中创建一个Spark的RDD,CSV内容如下:

    -88.331492,32.324142,hotel
    -88.175933,32.360763,gas
    -88.388954,32.357073,bar
    -88.221102,32.35078,restaurant
    

    然后我们初始化一个SparkContext,并调用GeoSpark的PointRDD,将我们的CSV导入。

     SparkConf conf = new SparkConf();
            conf.setAppName("GeoSpark01");
            conf.setMaster("local[*]");
            conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
            conf.set("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator");
            JavaSparkContext sc = new JavaSparkContext(conf);
    
            String pointRDDInputLocation = Learn01.class.getResource("checkin.csv").toString();
            Integer pointRDDOffset = 0; // 地理位置(经纬度)从第0列开始
            FileDataSplitter pointRDDSplitter = FileDataSplitter.CSV;
            Boolean carryOtherAttributes = true; // 第二列的属性(酒店名)
            PointRDD rdd = new PointRDD(sc, pointRDDInputLocation, pointRDDOffset, pointRDDSplitter, carryOtherAttributes);
    
  2. 坐标系转换

    1. GeoSpark采用EPGS标准坐标系,其坐标系也可参考EPSG官网:https://epsg.io/

    2. // 坐标系转换
      String sourceCrsCode = "epsg:4326";
      String targetCrsCode = "epsg:3857";
      rdd.CRSTransform(sourceCrsCode, targetCrsCode);
      

你可能感兴趣的:((一)Geospark入门)