数据挖掘--离群点检测

离群点检测R实现

##设置工作空间

#先用setwd设置工作空间,如D盘,并将相关数据拷贝到该目录下
setwd("("D:/discrete")
#读入数据
Data=read.csv("D:/discrete/data/",header=T)
Data=scale(Data)
set.seed(12)
km=kmeans(Data,center=3)
print(km)
km$centers
#各样本欧氏距离
x1=matrix(km$centers[1,], nrow = 940, ncol =3 , byrow = T)
juli1=sqrt(rowSums((Data-x1)^2))
x2=matrix(km$centers[2,], nrow = 940, ncol =3 , byrow = T)
juli2=sqrt(rowSums((Data-x2)^2))
x3=matrix(km$centers[3,], nrow = 940, ncol =3 , byrow = T)
juli3=sqrt(rowSums((Data-x3)^2))
dist=data.frame(juli1,juli2,juli3)
##欧氏距离最小值
y=apply(dist, 1, min)
plot(1:940,y,xlim=c(0,940),xlab="样本点",ylab="欧氏距离")
points(which(y>2.5),y[which(y>2.5)],pch=19,col="red")

你可能感兴趣的:(数据挖掘,离群点检测)