Question: understanding the algorithm #150
Replies: 9 comments
-
@hfwittmann Thank you for your question and your kind words. Unfortunately, the tutorial is an oversimplification of the problem and so I strongly recommend that you go over the original matrix profile paper to get a better sense of everything that needs to be accounted for. They are really worthwhile reading. Having said that, a naive implementation of what is happening can be found in our
|
Beta Was this translation helpful? Give feedback.
-
In short, the one piece that you are missing (and which is discussed in the paper) is the idea of an "exclusion zone". That is, not only is a subsequence, Thus, in your example In essence, you should replace the line I hope that helps! |
Beta Was this translation helpful? Give feedback.
-
@seanlaw Excellent, thank you for your explanation, really helpful! From what you are saying it appears to me the the exclusion zone should be symmetric, however
So I am still confused. Should the exclusion zone not come into play for (i), too? I will definitely follow your suggestion and have a look at the paper! |
Beta Was this translation helpful? Give feedback.
-
@hfwittmann You are correct. The exclusion zone should apply everywhere for self-joins. The issue that you are seeing seems to be fixed in the development version of the code (if you clone the repo then you should see that the results will change) and should look like:
@mexxexx Do you have any ideas as to what we may have changed (in dev) since our last release that would have fixed this issue? The 1.3.0 release on PyPI seems to give the following (wrong) result:
Since our unit tests were passing previously, this also implies that our unit tests were changed as well. |
Beta Was this translation helpful? Give feedback.
-
Hi @seanlaw, I remember that I also came across the issue of the asymmetric exclusion zone. It appears to me that it was issue #131. |
Beta Was this translation helpful? Give feedback.
-
Ahhh, yes! Thank you, @mexxexx. If it's okay with you (i.e., is there anything else major that still needs to be done?), I plan to push a minor release today. |
Beta Was this translation helpful? Give feedback.
-
Amazing! No, from my side there is nothing that has to be fixed before. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw @mexxexx Excellent stuff! I have pulled the update it's working as described! Thank you! |
Beta Was this translation helpful? Give feedback.
-
@hfwittmann closing this for now but, fyi, v1.3.1 (with all of the new additions) is now live and can be @mexxexx Thanks again for all of your hard work and contributions! |
Beta Was this translation helpful? Give feedback.
-
Thanks for an excellent package!
I am not sure whether I have a full understanding of the algorithm. I have tried to replicate the result for the the series from the example
https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html
with a basic implementation, but a get a deviation at one point. Therefore I am a little confused, where I am wrong/whether this is to be expected
So here are the results:
I get
``
import pandas as pd
import stumpy
import numpy as np
%%
time_series = np.array([0, 1, 3, 2, 9, 1, 14, 15, 1, 2, 2, 10, 7], dtype=float)
window = 4
S = stumpy.stump(time_series, m=window)
``
stumpy.stump(time_series, m=window)
array([[0.6424863376402249, 9, -1, 9],
[0.28570485146990177, 8, -1, 8],
[1.6401694431976326, 9, 0, 9],
[0.898130637894946, 1, 1, 8],
[1.2795471494078055, 9, 0, 9],
[1.781964662297751, 2, 2, 9],
[2.0583190140538696, 7, 3, 7],
[2.8394325732553067, 4, 4, 8],
[0.28570485146990177, 1, 1, 9],
[0.6424863376402249, 0, 0, -1]], dtype=object)
My own basic implementation:
yields:
argmins = np.nanargmin(d_plus, axis=1)
mins = np.nanmin(d_minus, axis=1)
argmins [9 8 9 1 9 2 7 6 1 0]
mins [0.642 0.286 1.64 0.898 1.28 1.782 2.058 2.058 0.286 0.642]
So the results match except for one position, in particular this is position 7
My result corresponds to
(i)
i = 7
j = 6
((zscore(time_series[i+0:i+4]) - zscore(time_series[j+0:j+4])) ** 2).sum() ** 0.5
2.058319014053869
Stumpy's result corresponds to
(ii)
i = 7
j = 4
((zscore(time_series[i+0:i+4]) - zscore(time_series[j+0:j+4])) ** 2).sum() ** 0.5
2.8394325732553067
It appears to me that (i) is correct.
Hence I am confused. Looking forward to your answer
Beta Was this translation helpful? Give feedback.
All reactions