Replies: 13 comments
-
@rtmatx Thanks for the question. We haven't tried this on streaming data so it will depend on how quickly the data is coming in. Presumably, not at the rate of which CERN is collecting data. So, assuming that the velocity of the incoming data is "reasonable" (however you define it - maybe 1% of the length, n, of your starting time series), streaming is actually a far easier problem as it is no longer an O(n^2) calculation (in terms of computational complexity). If I recall the paper correctly, once we calculate the matrix profile once then additional streaming data can be added incrementally in an efficient manner that is essentially O(n). This is because we don't have to recalculate the entire matrix profile. Instead, we just need to calculate the distance profile for the new (sliding) window (i.e., with new data point) and then update the matrix profile to reflect any new minimum distances. While this library doesn't give you streaming out of the box, it does the heavy lifting for you and calculates the matrix profile first which, again, is O(n^2). It should be fairly straightforward for you to update the matrix profile thereafter (both extending it and updating the existing distances). This could be a great PR if you'd like to work on a Tutorial that demonstrates streaming (maybe with the streamz Python package). |
Beta Was this translation helpful? Give feedback.
-
For the record (and I haven't thought about this thoroughly), my first instinct is that streaming is out of scope for this package and should not be a core, built-in capability/feature. Not because I don't think it's important (it is definitely valuable) but because it's not really contributing to making the matrix profile calculation faster (which is the key contribution of this package). I could see a spin-off package called, say, "stumpy-streaming" that would install "stumpy" as a dependency. However, a tutorial of how to accomplish streaming (after an initial matrix profile is computed) would be great. |
Beta Was this translation helpful? Give feedback.
-
I failed to acknowledge your point above regarding online vs offline. Yes, your approach would be correct in both cases. One thing that is not explicitly obvious from the examples is that the
|
Beta Was this translation helpful? Give feedback.
-
@rtmatx Let me know if it makes sense to close this issue |
Beta Was this translation helpful? Give feedback.
-
Hi @seanlaw, Thanks a lot for the detailed info. It makes lot of sense. I have been reading about STAMPI and FLOSS algorithms that are discussed in the paper, which follows the implementation that you have highlighted. I am very keen on implementing the streaming/online version of the matrix profile algorithm as it can be helpful in lot of places. I have to go through the implementation of the underlying algorithms. I can make a PR about it, it if you feel it can be added to this library. It might take sometime though |
Beta Was this translation helpful? Give feedback.
-
I think it should be |
Beta Was this translation helpful? Give feedback.
-
You are right! I've edited my original post above to reflect this error. |
Beta Was this translation helpful? Give feedback.
-
If I understand correctly, STAMPI is the interactive version of STAMP. However, it is significantly slower than STOMP and SCRIMP++ (i.e., interactive STOMP). SCRIMP++ and FLOSS are both on our project roadmap and we definitely welcome a contribution.
I may be misunderstanding your meaning as I am not a computer/software engineer or I haven't read all of the papers yet so please correct me if I'm wrong or direct me to the right paper(s). I'm not aware of a "version of the matrix profile algorithm" that is specific to streaming/online. I've only read 1-2 sentence references to this but have yet to come across pseudocode in the papers. Can you please clarify what you are referring to?
@rtmatx Depending on what you plans are, I think a tutorial contribution would be nice. Is that what you were thinking as well? If so, then we can close this issue and create a new one that is focused on a streaming data tutorial and we can add a more detailed checklist that is similar to Tutorial 1 (or shorter). Once the components are all there then we can review the PR. If indeed users start asking for streaming/online to be a code feature then we can open a new PR after the tutorial has been created that is targeted for migrating it to a new feature. Let me know what you think |
Beta Was this translation helpful? Give feedback.
-
Sorry for the misunderstanding, I am referring to STAMPI algorithm when I mentioned about matrix profile for streaming data.
That sounds good. I will make a notebook example and let's see if it can help other users. Based on that we can refactor it, if needed. I will close this issue and will create a PR once I have working implementation of matrix profile for streaming data using STAMPI Thanks |
Beta Was this translation helpful? Give feedback.
-
You know, I always (incorrectly) thought that STAMPI was was "interactive STAMP" (which I am against implementing - instead, we should implement SCRIMP++) but I just realized, after scanning the first paper, that is referring to "STAMP Incremental". This changes my perspective. It looks like there is a "STOMPI" in Table 5 of this paper. At a glance, it doesn't look too complicated. We should go with "STUMPI" :) |
Beta Was this translation helpful? Give feedback.
-
@rtmatx Just following up on this to see if you've had any opportunity to take a look at this or have any example code? Absolutely no pressure though. |
Beta Was this translation helpful? Give feedback.
-
Any updates on this @seanlaw @rtmatx ? it would be very helpful. |
Beta Was this translation helpful? Give feedback.
-
@hmcoservit Thank you for checking. Can you please describe your use case in this STUMPI issue? |
Beta Was this translation helpful? Give feedback.
-
I have gone through the notebooks to gloss over the type of implementation possible with time series data and noticed that all the examples and possibly the API is designed for static time series data.
Can the team share thoughts on application of these type of matrix profile algorithms to streaming data. For example, motif/pattern discovery or discord/anomaly detection on streaming data. Thanks :)
One approach that I can think of is using an offline and online approach. The idea is to have MP on historical data and for any new streaming data, the query (of some length of streaming data) will be used to compute its motif/discord. This is the online part.
For offline part, after certain time length, we recompute the MP based on the historical + newly available data to check for the shift in motifs/discords.
Beta Was this translation helpful? Give feedback.
All reactions