STDBSCAN检测异常轨迹

选取交通拥堵时段,某路段的轨迹,经过该路段的轨迹为异常轨迹,选取经过该路段非拥堵时间的轨迹,其轨迹为正常轨迹。采用STDBSCAN算法识别异常轨迹。
1.异常轨迹选取
对50175至50185路段,做每五分钟速度和车辆数统计:

SELECT
    avg("VELOCITY"),count("TM_SERIAL") AS COUNT_TM,COUNT("ctime") as count_point,
    TIMESTAMP WITH TIME ZONE 'epoch' +
    INTERVAL '1 second' * round(extract('epoch' from "ctime") / 300) * 300 as timestamp
    FROM public."TRACK_20190702_Snap"
    where "osm_id_new" > 50175 and "osm_id_new" < 50185
    and "ctime" > '2019-07-02 0:00:00+08' and "ctime" < '2019-07-02 23:59:00+08'
    GROUP BY timestamp


image.png

从图中可以看出,该路段在上午8点20分至上午9点10分为拥堵时段,取该时段在该路段的轨迹。

select "TM_SERIAL", "TM_SERIAL_SPLIT", "VELOCITY", "ctime", "osm_id_new"  FROM public."TRACK_20190702_Snap"
where "osm_id_new" > 50175 and "osm_id_new" < 50185
and "ctime" > '2019-07-02 8:30:00+08' and "ctime" < '2019-07-02 9:15:00+08'
order by "TM_SERIAL","ctime"

对取出的数据进行分析,发现轨迹的平均轨迹点数目为5,过该路段平均时长为6分钟。(去除轨迹点数目小于5的轨迹)
因此取经过该路段,在非拥堵时间的车辆轨迹,假设其为正常轨迹。

现在便得到我们需要的汇总数据(包括正常轨迹与异常轨迹)
使用STDBSCAN算法:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from datetime import datetime
import pandas as pd
import numpy as np
from stdbscan import STDBSCAN
from coordinates import convert_to_utm


def parse_dates(x):
    # return datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
    return datetime.strptime(x, '%Y/%m/%d %H:%M:%S')


def plot_clusters(df, output_name):
    import matplotlib.pyplot as plt

    labels = df['cluster'].values
    X = df[['lon', 'lat']].values

    # Black removed and is used for noise instead.
    unique_labels = set(labels)
    colors = [plt.cm.Spectral(each)
              for each in np.linspace(0, 1, len(unique_labels))]
    for k, col in zip(unique_labels, colors):
        if k == -1:
            # Black used for noise.
            col = [0, 0, 0, 1]

        class_member_mask = (labels == k)

        xy = X[class_member_mask]
        plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
                 markeredgecolor='k', markersize=6)

    plt.title('ST-DSCAN: #n of clusters {}'.format(len(unique_labels)))
    plt.show()
    # plt.savefig(output_name)


def test_time():
    filename = r"E:\李猛硕士毕设\实验部分\实验数据\2021-01-19\stdbscan\v1\guiji_ty.csv"
    df = pd.read_csv(filename, sep=",", converters={'date_time': parse_dates})
    '''
    First, we transform the lon/lat (geographic coordinates) to x and y 
    (in meters), in order to this, we need to select the right epsg (it 
    depends on the trace). After that, we run the algorithm. 
    '''

    st_dbscan = STDBSCAN(spatial_threshold=140, temporal_threshold=300000, velocity_threshold=10,
                         min_neighbors=8)

    # df = convert_to_utm(df, src_epsg=4326, dst_epsg=32649,
    #                     col_lat='latitude', col_lon='longitude')

    print(df)
    result_t600 = st_dbscan.fit_transform(df, col_lat='lat',
                                          col_lon='lon',
                                          col_time='date_time',
                                          col_velocity='velocity')
    return result_t600


if __name__ == '__main__':
    df = pd.DataFrame(test_time())
    print(df)
    print(pd.value_counts(df['cluster']))
    plot_clusters(df,"a")
    outfile = r"E:\李猛硕士毕设\实验部分\实验数据\2021-01-19\stdbscan\v1\stdbscan_result.csv"
    df.to_csv(outfile,sep=',', header=True)

其中 EPS1(空间距离为100),EPS2(速度距离为10),最小点数为50:
空间距离的取法:
循环:
计算每个点到其他点的距离,升序排序,取前K个加入列表中List。K(取值为 ln(样本点数) )
对List降序,距离值为 肘部值。
KNN 算法代码:

# 计算knn 距离,输入为点集的csv
import pandas as pd
import math

def CalculateDistance(x1,y1,x2,y2):
    return math.sqrt((x1-x2)**2 +(y1-y2)**2 )

def CalculateDistance_velocity(v1,v2):
    
    return max(v1, v2) - min(v1, v2)
if __name__ == "__main__":
    point_file_csv = r'E:\李猛硕士毕设\实验部分\实验代码\st_dbscan\py-st-dbscan-master\python\src\guiji_result_.csv'
    df = pd.read_csv(point_file_csv)
    print(df)
    point_size = df.shape[0]
    k = math.ceil(math.log(point_size))
    print(k)
    distance_list = []
    for index,row in df.iterrows():
        distance_list_temp = []
        # point1_lon = row['lon']
        # point1_lat = row['lat']
        v1 = row['velocity']
        for index_1, row_1 in df.iterrows():
            # point2_lon = row_1['lon']
            # point2_lat = row_1['lat']
            v2 = row_1['velocity']
            if index != index_1:
                # distance = CalculateDistance(point1_lon,point1_lat,point2_lon,point2_lat)
                distance = CalculateDistance_velocity(v1,v2)
                # print(distance)
                distance_list_temp.append(distance)
        distance_list_temp.sort() # distance_list_temp 是每个点到其他点的距离 列表,这里做了升序
        # print(distance_list_temp)
        for i in range(0,k):  #
            if i >=k : break
            distance = distance_list_temp[i]
            # print(distance_list_temp[i])
            distance_list.append(distance)
        # distance = distance_list_temp[k-1]
        # distance_list.append(distance)

    
    distance_list.sort(reverse = True)
    print(distance_list)

    outputFile = r'E:\李猛硕士毕设\实验部分\实验代码\st_dbscan\py-st-dbscan-master\python\src\knn_result_velocity.csv'
    with open(outputFile,"a") as f1:
        for i in distance_list:
            print(i)
            f1.write("{0}\n".format(i))

对STDBSCAN结果数据进行统计,计算每个簇的强度,簇的平均速度,依据平均速度将类别缩小,速度低于某个值判定为异常值,每条轨迹的类别为轨迹中类别点最多的轨迹点的类别。

你可能感兴趣的:(STDBSCAN检测异常轨迹)