选取交通拥堵时段,某路段的轨迹,经过该路段的轨迹为异常轨迹,选取经过该路段非拥堵时间的轨迹,其轨迹为正常轨迹。采用STDBSCAN算法识别异常轨迹。
1.异常轨迹选取
对50175至50185路段,做每五分钟速度和车辆数统计:
SELECT
avg("VELOCITY"),count("TM_SERIAL") AS COUNT_TM,COUNT("ctime") as count_point,
TIMESTAMP WITH TIME ZONE 'epoch' +
INTERVAL '1 second' * round(extract('epoch' from "ctime") / 300) * 300 as timestamp
FROM public."TRACK_20190702_Snap"
where "osm_id_new" > 50175 and "osm_id_new" < 50185
and "ctime" > '2019-07-02 0:00:00+08' and "ctime" < '2019-07-02 23:59:00+08'
GROUP BY timestamp
从图中可以看出,该路段在上午8点20分至上午9点10分为拥堵时段,取该时段在该路段的轨迹。
select "TM_SERIAL", "TM_SERIAL_SPLIT", "VELOCITY", "ctime", "osm_id_new" FROM public."TRACK_20190702_Snap"
where "osm_id_new" > 50175 and "osm_id_new" < 50185
and "ctime" > '2019-07-02 8:30:00+08' and "ctime" < '2019-07-02 9:15:00+08'
order by "TM_SERIAL","ctime"
对取出的数据进行分析,发现轨迹的平均轨迹点数目为5,过该路段平均时长为6分钟。(去除轨迹点数目小于5的轨迹)
因此取经过该路段,在非拥堵时间的车辆轨迹,假设其为正常轨迹。
现在便得到我们需要的汇总数据(包括正常轨迹与异常轨迹)
使用STDBSCAN算法:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from datetime import datetime
import pandas as pd
import numpy as np
from stdbscan import STDBSCAN
from coordinates import convert_to_utm
def parse_dates(x):
# return datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
return datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
def plot_clusters(df, output_name):
import matplotlib.pyplot as plt
labels = df['cluster'].values
X = df[['lon', 'lat']].values
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (labels == k)
xy = X[class_member_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)
plt.title('ST-DSCAN: #n of clusters {}'.format(len(unique_labels)))
plt.show()
# plt.savefig(output_name)
def test_time():
filename = r"E:\李猛硕士毕设\实验部分\实验数据\2021-01-19\stdbscan\v1\guiji_ty.csv"
df = pd.read_csv(filename, sep=",", converters={'date_time': parse_dates})
'''
First, we transform the lon/lat (geographic coordinates) to x and y
(in meters), in order to this, we need to select the right epsg (it
depends on the trace). After that, we run the algorithm.
'''
st_dbscan = STDBSCAN(spatial_threshold=140, temporal_threshold=300000, velocity_threshold=10,
min_neighbors=8)
# df = convert_to_utm(df, src_epsg=4326, dst_epsg=32649,
# col_lat='latitude', col_lon='longitude')
print(df)
result_t600 = st_dbscan.fit_transform(df, col_lat='lat',
col_lon='lon',
col_time='date_time',
col_velocity='velocity')
return result_t600
if __name__ == '__main__':
df = pd.DataFrame(test_time())
print(df)
print(pd.value_counts(df['cluster']))
plot_clusters(df,"a")
outfile = r"E:\李猛硕士毕设\实验部分\实验数据\2021-01-19\stdbscan\v1\stdbscan_result.csv"
df.to_csv(outfile,sep=',', header=True)
其中 EPS1(空间距离为100),EPS2(速度距离为10),最小点数为50:
空间距离的取法:
循环:
计算每个点到其他点的距离,升序排序,取前K个加入列表中List。K(取值为 ln(样本点数) )
对List降序,距离值为 肘部值。
KNN 算法代码:
# 计算knn 距离,输入为点集的csv
import pandas as pd
import math
def CalculateDistance(x1,y1,x2,y2):
return math.sqrt((x1-x2)**2 +(y1-y2)**2 )
def CalculateDistance_velocity(v1,v2):
return max(v1, v2) - min(v1, v2)
if __name__ == "__main__":
point_file_csv = r'E:\李猛硕士毕设\实验部分\实验代码\st_dbscan\py-st-dbscan-master\python\src\guiji_result_.csv'
df = pd.read_csv(point_file_csv)
print(df)
point_size = df.shape[0]
k = math.ceil(math.log(point_size))
print(k)
distance_list = []
for index,row in df.iterrows():
distance_list_temp = []
# point1_lon = row['lon']
# point1_lat = row['lat']
v1 = row['velocity']
for index_1, row_1 in df.iterrows():
# point2_lon = row_1['lon']
# point2_lat = row_1['lat']
v2 = row_1['velocity']
if index != index_1:
# distance = CalculateDistance(point1_lon,point1_lat,point2_lon,point2_lat)
distance = CalculateDistance_velocity(v1,v2)
# print(distance)
distance_list_temp.append(distance)
distance_list_temp.sort() # distance_list_temp 是每个点到其他点的距离 列表,这里做了升序
# print(distance_list_temp)
for i in range(0,k): #
if i >=k : break
distance = distance_list_temp[i]
# print(distance_list_temp[i])
distance_list.append(distance)
# distance = distance_list_temp[k-1]
# distance_list.append(distance)
distance_list.sort(reverse = True)
print(distance_list)
outputFile = r'E:\李猛硕士毕设\实验部分\实验代码\st_dbscan\py-st-dbscan-master\python\src\knn_result_velocity.csv'
with open(outputFile,"a") as f1:
for i in distance_list:
print(i)
f1.write("{0}\n".format(i))
对STDBSCAN结果数据进行统计,计算每个簇的强度,簇的平均速度,依据平均速度将类别缩小,速度低于某个值判定为异常值,每条轨迹的类别为轨迹中类别点最多的轨迹点的类别。