代码工具 A new R package for detecting unusual time series

The anom­alous pack­age pro­vides some tools to detect unusual time series in a large col­lec­tion of time series. This is joint work with Earo Wang (an hon­ours stu­dent at Monash) and Niko­lay Laptev (from Yahoo Labs). Yahoo is inter­ested in detect­ing unusual pat­terns in server met­rics.
The pack­age is based on this paper with Earo and Niko­lay.
The basic idea is to mea­sure a range of fea­tures of the time series (such as strength of sea­son­al­ity, an index of spik­i­ness, first order auto­cor­re­la­tion, etc.) Then a prin­ci­pal com­po­nent decom­po­si­tion of the fea­ture matrix is cal­cu­lated, and out­liers are iden­ti­fied in 2-​​dimensional space of the first two prin­ci­pal com­po­nent scores.
We use two meth­ods to iden­tify outliers.
A bivari­ate ker­nel den­sity esti­mate of the first two PC scores is com­puted, and the points are ordered based on the value of the den­sity at each obser­va­tion. This gives us a rank­ing of most out­ly­ing (least den­sity) to least out­ly­ing (high­est density).
A series of

Rendered by QuickLaTeX.com
–con­vex hulls are com­puted on the first two PC scores with decreas­ing
Rendered by QuickLaTeX.com
, and points are clas­si­fied as out­liers when they become sin­gle­tons sep­a­rated from the main hull. This gives us an alter­na­tive rank­ing with the most out­ly­ing hav­ing sep­a­rated at the high­est value of
Rendered by QuickLaTeX.com
, and the remain­ing out­liers with decreas­ing val­ues of
Rendered by QuickLaTeX.com
.

I explained the ideas in a talk last Tues­day given at a joint meet­ing of the Sta­tis­ti­cal Soci­ety of Aus­tralia and the Mel­bourne Data Sci­ence Meetup Group. Slides are avail­able here. A link to a video of the talk will also be added there when it is ready.
The density-​​ranking of PC scores was also used in my work on detect­ing out­liers in func­tional data. See my 2010 JCGS paper and the asso­ci­ated rain­bow pack­age for R.
There are two ver­sions of the pack­age: one under an ACM licence, and a lim­ited ver­sion under a GPL licence. Even­tu­ally we hope to make the GPL ver­sion con­tain every­thing, but we are cur­rently depen­dent on the alphahull pack­age which has an ACM licence.

你可能感兴趣的:(代码工具 A new R package for detecting unusual time series)