背景
在之前算过城市和景区的距离,算法是在hive中通过公式强算出来的,这次又让算景区和县城(区)之间的坐标距离,所以调研了一下算距离的方式,发现postgresql本身的扩展包是支持算地理距离的,记录整个过程。
简介
PostGIS是PostgreSQL关系数据库的空间数据库扩展。对地理类型的支持,允许在SQL中运行位置计算和查询,官网;
景区的表
将表中的ltree结构中的信息解析出来,算出来每个景区的经纬度,这个表作为单独的景区和经纬度的表。
insert into tmp.sight_dw_dw (id, name, star, ticket_price, address, status, country, area, region, city, county, rpt_time, baidu_point)
select
id,
name,
star,
ticket_price,
address,
status,
country,
area,
case when city in ('北京', '上海', '天津', '重庆', '香港', '澳门', '新加坡') then
city
else
region
end as region,
city,
county,
rpt_time,
baidu_point
from (
select
id,
name,
star,
ticket_price,
address,
status,
(
select
name
from
mirror.sight st
where
st.area_path @> s.area_path
and type = '国家'
order by
nlevel (st.area_path) desc
limit 1) as country,
(
select
name
from
mirror.sight st
where
st.area_path @> s.area_path
and type = '地区'
order by
nlevel (st.area_path) desc
limit 1) as area,
(
select
name
from
mirror.sight st
where
st.area_path @> s.area_path
and type = '省份'
order by
nlevel (st.area_path) desc
limit 1) as region,
(
select
name
from
mirror.sight st
where
st.area_path @> s.area_path
and type = '城市'
order by
nlevel (st.area_path) desc
limit 1) as city,
(
select
name
from
mirror.sight st
where
st.area_path @> s.area_path
and type = '区县'
order by
nlevel (st.area_path) desc
limit 1) as county,
now() rpt_time,
baidu_point
from
mirror.sight s
where
s.area_path <@ (
select
st.area_path
from
mirror.sight st
where
st.name = '中国')
and (s.type = '景区'
or s.type = '景点'
or s.type = '门店'
or s.type = '虚拟景点'
or s.type = '虚拟')
and status not in ('hard_removed', 'deleted')) a;
因为经纬度在一个varchar的字段中,将经纬度拆开,表结构如下,更下代码如下,
insert into tmp.distince(id,name,longitude,latitude)
select
id,
name,
(string_to_array(baidu_point,','))[1]::double precision,
(string_to_array(baidu_point,','))[2]::double precision
from tmp.sight_dw_dw
where baidu_point is not null;
区县的表
县区的表结构如下,表更新的逻辑如下
insert into tmp.diis(city,longitude,latitude)
select
city|| '_' || county,
(string_to_array(baidu_point,','))[1]::double precision,
(string_to_array(baidu_point,','))[2]::double precision
from
(
select
(select name from mirror.sight st where st.area_path@>s.area_path and type ='城市' order by nlevel(st.area_path) desc limit 1 ) as city,
(select name from mirror.sight st where st.area_path@>s.area_path and type ='区县' order by nlevel(st.area_path) desc limit 1 ) as county,
baidu_point
from mirror.sight s
where s.area_path<@(select st.area_path from mirror.sight st where st.name ='中国' ) and (s.type = '区县')
) a where city|| '_' || county <> '' and baidu_point is not null;
UPDATE tmp.diis SET where_is = ST_POINT(latitude, longitude);
计算距离
通过笛卡尔积去计算每个景区和县城的距离,因为不需要计算太远的距离,在where中做了限制100KM
select
a.city,
b.name,
ST_DISTANCE (a.where_is, ST_POINT (b.latitude, b.longitude))
from
tmp.diis a
join tmp.distince b on 1 = 1
where abs(a.latitude - b.latitude) <=0.9 and abs(a.longitude- b.longitude) <=0.9
结果表如下