Replies: 2 comments 9 replies
-
@JaKasb Thank you so much for starting the discussion and for your wonderful feedback regarding the code/writing style as your support means a lot.
I could be wrong but I believe that the time complexity might be closer to
This is a valid point. If you are only doing this mpdist sorting once/twice then the computational time should be relatively cheap especially on a short array as the computation should be dominated by MASS/STOMP as you've pointed out. However, if you are running
So, this part shouldn't matter because Dask and Numba are not being used for the actual selection of the top-K P_ABBA values. I believe that all of the logic is accomplished in pure NumPy: Lines 204 to 214 in 10b2672
That's great to hear! I'm sure that you are doing it already but please don't forgot to cite the STUMPY paper where appropriate as we love reading about how STUMPY is being leveraged. |
Beta Was this translation helpful? Give feedback.
-
Looks good to me. In the current version the sorting occurs inplace. Does the code depend on P_ABBA being sorted, outside of _select_P_ABBA_value() ? Lines 281 to 285 in 10b2672 Because if you sort P_ABBA inplace and afterwards write into P_ABBA, the output should be different if the sort() is removed. The reference implementation of MPdist_vect does not sort before the ingress and egress. I don't fully understand MPdist_vect yet. They paper mentions a moving_min() function. Maybe this is a bug ? https://sites.google.com/site/mpdistinfo/ percentage = min(percentage, 1.0)
percentage = max(percentage, 0.0) Can be simplified into |
Beta Was this translation helpful? Give feedback.
-
In MPdist.py
https://github.com/TDAmeritrade/stumpy/blob/main/stumpy/mpdist.py
The output is the k-th smallest value of sorted(P_ABBA)
However one can extract the k-th smallest value without sorting the full array.
Skipping the sort() is possible by using numpy.partition()
https://numpy.org/doc/stable/reference/generated/numpy.partition.html#numpy.partition
sort() is O(nlogn) whereas partition() is O(n)
I don't know if this speedup affects the overall runtime or if MASS/STOMP is by far the dominant runtime consumer.
I also don't know if dask and numba support numpy.partition()
BTW I like your code and writing style.
I use stumpy in my research.
Beta Was this translation helpful? Give feedback.
All reactions