LoOP Local Outlier Probabilities
此方法为lof的改进版本:
lof有一个显著的缺点:即最终的得分不能判断异常值的程度,loOp将异常值表示在[0,1]的范围内,可以根据概率对异常值判断。
相比于欧氏距离,引入了概率距离这一概念,相对于点 o o o而言,设其邻域集合为 S S S, P [ d ( o , s ) \mathcal{P}[d(o, s) P[d(o,s)为点 o o o到近邻点的距离,概率距离 pdist ( o , S ) \operatorname{pdist}(o, S) pdist(o,S)定义如下:
∀ s ∈ S : P [ d ( o , s ) ≤ pdist ( o , S ) ] ≥ φ (1) \forall s \in S: \quad \mathcal{P}[d(o, s) \leq \operatorname{pdist}(o, S)] \geq \varphi\tag{1} ∀s∈S:P[d(o,s)≤pdist(o,S)]≥φ(1)
简单地解释,只有大多数近邻点的距离在概率距离范围之内,也就是说不在概率距离范围之内的点就是异常值。
如果使用 λ = 2 ⋅ erf − 1 ( φ ) \lambda=\sqrt{2} \cdot \operatorname{erf}^{-1}(\varphi) λ=2⋅erf−1(φ)代替 λ \lambda λ,erf为高斯误差函数。将离群值定义为偏离平均值超过给定 λ 乘以标准差 σ 的对象(类似sor滤波)。根据3sigma法则:λ = 1 ⇔ φ ≈ 68%, λ = 2 ⇔ φ ≈ 95%, and λ = 3 ⇔ φ ≈ 99.7%。
假设o是S的中心,并且s ∈ S到o的距离集合近似为半高斯分布,则可以计算出S中物体到o的标准距离,其类似于标准差:
σ ( o , S ) = ∑ s ∈ S d ( o , s ) 2 ∣ S ∣ (2) \sigma(o, S)=\sqrt{\frac{\sum_{s \in S} d(o, s)^2}{|S|}}\tag{2} σ(o,S)=∣S∣∑s∈Sd(o,s)2(2)
注意这里的标准距离和标准差不太一样(没有使用距离减去距离的均值)。我们不考虑在全局范围内距离呈正态分布,而是假设在集合 S S S呈正态分布。因此,S 以 o 为中心的假设通常是合理的。基于这些考虑,我们将 o 到 S 的显着性 λ 的概率集合距离定义为:
pdist ( λ , o , S ) : = λ ⋅ σ ( o , S ) (3) \operatorname{pdist}(\lambda, o, S):=\lambda \cdot \sigma(o, S)\tag{3} pdist(λ,o,S):=λ⋅σ(o,S)(3)
异常值的局部概率建模基于以下假设 (i.)以近邻集合 S S S为基础进行的(ii)距离符合正半部分的正态分布(x>0)。对于假设合理性的说明省略(见原文)。
Probabilistic Local Outlier Factor (PLOF)可以定义如下:
PLOF λ , S ( o ) : = pdist ( λ , o , S ( o ) ) E s ∈ S ( o ) [ pdist ( λ , s , S ( s ) ) ] − 1 (4) \operatorname{PLOF}_{\lambda, S}(o):=\frac{\operatorname{pdist}(\lambda, o, S(o))}{E_{s \in S(o)}[\operatorname{pdist}(\lambda, s, S(s))]}-1\tag{4} PLOFλ,S(o):=Es∈S(o)[pdist(λ,s,S(s))]pdist(λ,o,S(o))−1(4)
E ( ) E() E()表示期望。
对上式进行归一化:
n P L O F : = λ ⋅ E [ ( P L O F ) 2 ] (5) \mathrm{nPLOF}:=\lambda \cdot \sqrt{E\left[(\mathrm{PLOF})^2\right]}\tag{5} nPLOF:=λ⋅E[(PLOF)2](5)
该值可以看作是 PLOF 值的一种标准差,即假设平均值为 0 的 λ·Stddev(PLOF)。
为了将尚未归一化的 PLOF 值转换为概率值,我们假设这些值在 1 附近呈正态分布,标准差为 nPLOF。然后,我们应用高斯误差函数来获得概率值,即局部离群点概率 (LoOP),表示点 o ∈ D 是离群点的概率:
LoOP S ( o ) : = max { 0 , erf ( PLOF λ , S ( o ) nPLOF ⋅ 2 ) } (6) \operatorname{LoOP}_S(o):=\max \left\{0, \operatorname{erf}\left(\frac{\operatorname{PLOF}_{\lambda, S}(o)}{\operatorname{nPLOF} \cdot \sqrt{2}}\right)\right\}\tag{6} LoOPS(o):=max{0,erf(nPLOF⋅2PLOFλ,S(o))}(6)
#include
#include
#include
#include
#include
int main()
{
using pointcloud = pcl::PointCloud<pcl::PointXYZ>;
//params,values are from paper
const double lambda = 3.0;
const int k = 20;
pointcloud cloud_in{};
pcl::io::loadPCDFile("D:/scaled_cloud.pcd", cloud_in);
std::cout << "raw point size:" << cloud_in.size() << std::endl;
pcl::KdTreeFLANN<pcl::PointXYZ>kdtree{};
kdtree.setInputCloud(cloud_in.makeShared());
//Local Outlier Probabilities
std::vector<double>loop(cloud_in.size(), 0.0);
//1 compute the square dist between query point and nearest point
std::vector<float>nearest_square_dist(cloud_in.size(), 0.0);
//the k neighbor about query point
std::vector<std::vector<int>> point_neighbor_indices(cloud_in.size(), std::vector<int>(k,0));
for (size_t i = 0; i < cloud_in.size(); i++)
{
std::vector<int>indices{};
std::vector<float>dist{};
if (kdtree.nearestKSearch(cloud_in.points[i], k, indices, dist)>1)
{
//indices[0] is the query point index
nearest_square_dist[i] = dist[1];
point_neighbor_indices[i] = indices;
}
}
//2 compute the standard dist about every point
std::vector<double>std_dist(cloud_in.size(), 0.0);
//3probabilistic set dist
std::vector<double>p_dist(cloud_in.size(), 0.0);
for (size_t i = 0; i < cloud_in.size(); i++)
{
float sqr_sum = 0;
//sum of d(o,s)^2
for (size_t j = 0; j < point_neighbor_indices[i].size(); j++)
{
sqr_sum += nearest_square_dist[point_neighbor_indices[i][j]];
}
std_dist[i] = std::sqrt(sqr_sum / float(k));
p_dist[i] = lambda * std_dist[i];
}
//4probabilistic lof
std::vector<float>plof(cloud_in.size(), 0.0);
for (size_t i = 0; i < cloud_in.size(); i++)
{
float sqr_sum = 0;
//expected value of pdist
for (size_t j = 0; j < point_neighbor_indices[i].size(); j++)
{
sqr_sum += p_dist[point_neighbor_indices[i][j]];
}
float ev_pdist = sqr_sum / float(k);
plof[i] = (p_dist[i] / ev_pdist) - 1.0;
}
//4normalization plof
//
std::vector<float>nplof(cloud_in.size(), 0.0);
for (size_t i = 0; i < cloud_in.size(); i++)
{
float sqr_sum = 0;
for (size_t j = 0; j < point_neighbor_indices[i].size(); j++)
{
sqr_sum +=std::pow( plof[point_neighbor_indices[i][j]],2);
}
//expected value of square plof
float ev_plof_2 = sqr_sum / float(k);
nplof[i] = lambda * std::sqrt(ev_plof_2);
}
//5loOp
std::vector<double>loOp(cloud_in.size(), 0.0);
for (size_t i = 0; i < cloud_in.size(); i++)
{
double loOp1 = std::erf(plof[i] / (nplof[i] * std::sqrt(2)));
loOp[i] = loOp1 > 0.0 ? loOp1 : 0.0;
}
pointcloud cloud_out{};
std::ofstream out("./noise_cloud.xyz");
for (int i = 0; i < loOp.size(); ++i)
{
if (loOp[i]<0.5)
{
cloud_out.push_back(cloud_in.points[i]);
}
else
{
out << loOp[i] << " " << cloud_in.points[i].x << " " << cloud_in.points[i].y << " " << cloud_in.points[i].z << "\n";
}
}
out.close();
std::cout << "filtered point size:" << cloud_out.size() << std::endl;
cloud_out.width = cloud_out.size();
cloud_out.height = 1;
cloud_out.resize(cloud_out.width*cloud_out.height);
pcl::io::savePCDFileBinary("filtered_cloud.pcd", cloud_out);
std::cout << "Hello World!\n";
}