difference between 1 nearest neighbor and 1000? in use for anomaly detection #933

feigin · 2023-11-15T16:03:17Z

feigin
Nov 15, 2023

Hi, since to distance score is calculated only according to the nearest neighbor, if there is just one nearest neighbor , or 1000, will the score be the same?

assuming i have 2 anomalies in a timeseries which are the same, is it reasonable to expect their score will be similar to patterns that occur thousands of times? thanks

Answered by seanlaw

Nov 15, 2023

@feigin Indeed, this is the expected behavior. Since each dataset/problem is different, it is up to you to decide how to interpret the results. In this case, you can leverage our recently added top-k feature and do:

k = 3
mp = stumpy.stump(df['pattern_one_anomaly'].astype(float), m, k=k)
avg_mp = mp[:, 0:k].mean(axis=1)

Here, setting k=3 will return the distance to the top-3 nearest neighbors for each subsequence. Then, we compute the average distance (or sum the distances). In this case, if the anomaly is only repeated once (i.e., there is a single pair), then the average distance will NOT be zero and will therefore stand out as an anomaly. However, it is up to you to decide what k shou…

View full answer

seanlaw · 2023-11-15T16:25:25Z

seanlaw
Nov 15, 2023
Maintainer

@feigin Welcome to the STUMPY community and thank you for your question. I'm not able to understand your question. Are you able to provide some code to better describe what you are trying to do?

4 replies

feigin Nov 15, 2023
Author

@seanlaw thank you for such a quick reply! I created a colab with the scenario I hope stumpy can solve for me.

https://colab.research.google.com/drive/1sNZtAK5B3gowA5JbOHl91yWV8DQWAMYZ?usp=sharing

in this example there is mock data, with a pattern recurring N times: 1,2,3,2,1,2,3,2,1,2,3,2,1,2,3,2,1 etc..
I insert an anomaly of 5,6,7,8 so the new timeseries is: 1,2,3,2,5,6,7,8,1,2,3,2,1,2,3,2,1 etc..

stumpy shows this anomaly perfectly, as there is no nearest neighbor.
however if I insert this anomaly twice, stumpy is content that there is 1 nearest neighbor, and even though this happens a fraction of the total time series, it is not detected by the algorithm.

see attached photos of both scenarios, I hope this makes it clear, anyway thank you for your time and the great package you have online!

this is the core of the code:
matrix_profile = stumpy.stump(df['pattern_one_anomaly'].astype(float), m)

seanlaw Nov 15, 2023
Maintainer

@feigin Indeed, this is the expected behavior. Since each dataset/problem is different, it is up to you to decide how to interpret the results. In this case, you can leverage our recently added top-k feature and do:

k = 3
mp = stumpy.stump(df['pattern_one_anomaly'].astype(float), m, k=k)
avg_mp = mp[:, 0:k].mean(axis=1)

Here, setting k=3 will return the distance to the top-3 nearest neighbors for each subsequence. Then, we compute the average distance (or sum the distances). In this case, if the anomaly is only repeated once (i.e., there is a single pair), then the average distance will NOT be zero and will therefore stand out as an anomaly. However, it is up to you to decide what k should be using your expert judgement.

Answer selected by feigin

feigin Nov 15, 2023
Author

oh wow! wonderful addition ! thank you for your help

seanlaw Nov 15, 2023
Maintainer

All of the credit goes to @NimaSarajpoor as he was the one who worked tirelessly to add this useful feature!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

difference between 1 nearest neighbor and 1000? in use for anomaly detection #933

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

difference between 1 nearest neighbor and 1000? in use for anomaly detection #933

feigin Nov 15, 2023

Replies: 1 comment · 4 replies

seanlaw Nov 15, 2023 Maintainer

feigin Nov 15, 2023 Author

seanlaw Nov 15, 2023 Maintainer

feigin Nov 15, 2023 Author

seanlaw Nov 15, 2023 Maintainer

feigin
Nov 15, 2023

Replies: 1 comment 4 replies

seanlaw
Nov 15, 2023
Maintainer

feigin Nov 15, 2023
Author

seanlaw Nov 15, 2023
Maintainer

feigin Nov 15, 2023
Author

seanlaw Nov 15, 2023
Maintainer