Apache Flink介绍
Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。Flink以数据并行和流水线方式执行任意流数据程序,Flink的流水线运行时系统可以执行批处理和流处理程序。
Apache Sedona介绍
Apache Sedona(孵化中)是一个用于处理大规模空间数据的集群计算系统。Sedona用一套开箱即用的分布式空间数据集和空间SQL扩展了Apache Spark和Apache Flink,可以在机器间有效地加载、处理和分析大规模空间数据。
处理流程
java代码
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings settings = EnvironmentSettings.newInstance().inStreamingMode().build();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, settings);
SedonaFlinkRegistrator.registerType(env);
SedonaFlinkRegistrator.registerFunc(tableEnv);
DataStream socketTextStream = env.socketTextStream("localhost", 8888);
DataStream rowDataStream = socketTextStream.map(new MapFunction() {
private static final long serialVersionUID = -3351062125994879777L;
@Override
public Row map(String line) throws Exception {
//System.out.println(line);
//System.out.println("11");
String[] fields = line.split(",");
String pointWkt = fields[1] ;
int id = Integer.parseInt(fields[0]);
return Row.of(pointWkt, id);
}
}).returns(Types.ROW(Types.STRING, Types.INT));
Table pointTable = tableEnv.fromDataStream(rowDataStream);
tableEnv.createTemporaryView("myTable", pointTable);
Table geomTbl = tableEnv.sqlQuery("SELECT ST_GeomFromWKT(f0) as geom_polygon, f1 FROM myTable");
tableEnv.createTemporaryView("geoTable", geomTbl);
geomTbl = tableEnv.sqlQuery("SELECT f1, geom_polygon FROM geoTable where ST_Contains (ST_GeomFromWKT('MultiPolygon (((110.499997 20.010307, 110.499995 20.010759, 110.500473 20.01076, 110.500475 20.010308, 110.499997 20.010307)))'),geom_polygon)");
geomTbl.execute().print();
pom.xml配置
4.0.0
cn.hwang
geospark-dev
1.0
2.12
2.12
1.2.0
3.0
3.1.2
3.2.0
24.0
1.14.3
2.8.1
compile
UTF-8
org.apache.sedona
sedona-core-3.0_2.12
1.2.0-incubating
org.apache.sedona
sedona-viz-3.0_2.12
1.2.0-incubating
org.apache.sedona
sedona-sql-3.0_2.12
1.2.0-incubating
org.apache.sedona
sedona-flink_2.12
1.2.0-incubating
org.scala-lang
scala-library
2.12.13
org.apache.spark
spark-core_2.12
${spark.version}
${dependency.scope}
org.apache.hadoop
*
org.apache.spark
spark-sql_2.12
${spark.version}
${dependency.scope}
org.apache.hadoop
hadoop-mapreduce-client-core
${hadoop.version}
${dependency.scope}
org.apache.hadoop
hadoop-common
${hadoop.version}
${dependency.scope}
org.postgresql
postgresql
42.2.5
org.datasyslab
geotools-wrapper
1.1.0-25.2
org.locationtech.jts
jts-core
1.18.0
org.wololo
jts2geojson
0.16.1
org.locationtech.jts
jts-core
com.fasterxml.jackson.core
*
org.geotools
gt-main
24.0
org.geotools
gt-referencing
24.0
org.geotools
gt-epsg-hsql
24.0
org.apache.flink
flink-core
${flink.version}
${dependency.scope}
org.apache.flink
flink-streaming-java_${scala.compat.version}
${flink.version}
${dependency.scope}
org.apache.kafka
kafka-clients
${kafka.version}
org.apache.flink
flink-connector-kafka_${scala.compat.version}
${flink.version}
${dependency.scope}
org.apache.flink
flink-clients_${scala.compat.version}
${flink.version}
${dependency.scope}
org.apache.flink
flink-table-api-java-bridge_${scala.compat.version}
${flink.version}
${dependency.scope}
org.apache.flink
flink-table-planner_${scala.compat.version}
${flink.version}
${dependency.scope}
org.apache.flink
flink-table-common
${flink.version}
${dependency.scope}
org.apache.flink
flink-csv
${flink.version}
${dependency.scope}
org.apache.flink
flink-runtime-web_${scala.compat.version}
${flink.version}
test
org.apache.maven.plugins
maven-compiler-plugin
3.8.0
1.8
central
maven.aliyun.com
https://maven.aliyun.com/repository/public
maven2-repository.dev.java.net
Java.net repository
https://download.java.net/maven/2
osgeo
OSGeo Release Repository
https://repo.osgeo.org/repository/release/
false
true
Central
Central Repository
https://repo1.maven.org/maven2/
下载netcat解压到本地,使用cmd运行nc程序,模拟流数据输入。下面就是测试的数据模板。
1,Point (110.500235 20.0105335)
2,Point (110.18832409 20.06088375)
3,Point (110.18784591 20.06088125)
4,Point (109.4116775 18.2997045)
5,Point (109.55539791 18.30762275)
6,Point (109.5483405 19.778523)
粘贴测试数据到nc程序下,开始运行代码。
[图片上传失败...(image-fe31f3-1652021098777)]
Apache Sedona已经可以成功运行一些空间流数据,十分感谢JupiterChow同学对我的帮助,给我提供了示例代码和配置,还有耐心讲解。
附( 花了好几天安装但是没用上的,flink1.14 部署到ubuntu)
安装JDK
sudo apt update
sudo apt install openjdk-11-jdk
下载解压Flink
wget https://dlcdn.apache.org/flink/flink-1.14.4/flink-1.14.4-bin-scala_2.11.tgz
tar -xzf flink-1.14.4-bin-scala_2.11.tgz
cd flink-1.14.4
启动集群
./bin/start-cluster.sh
测试自带的例子
./bin/flink run ./examples/batch/WordCount.jar
打开自带的UI界面
wget https://github.com/glink-incubator/glink/releases/download/release-1.0.0/glink-1.0.0-bin.tar.gz
tar -zxvf glink-1.0.0-bin.tar.gz
./flink-1.14.4/bin/flink run ./examples/batch/WordCount.jar
参考资料
https://zhuanlan.zhihu.com/p/447743903
https://www.cnblogs.com/liufei1983/p/15661322.html
https://blog.csdn.net/weixin_46684578/article/details/122803180
https://juejin.cn/post/7023210394894204936
https://cloud.tencent.com/developer/article/1626610
https://baike.baidu.com/item/Apache%20Flink/59924858
https://eternallybored.org/misc/netcat/
https://sedona.apache.org/