Saving motifs for comparison to other files #936
-
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
@datadeng46 Thank you for your question and welcome to the STUMPY community. I understand that each file has ~6000 datapoints but can you tell me how many files there are in total?
This may/may not be true depending on the length of the full/complete dataset so knowing the size will be helpful. Additionally, what hardware (CPUs, GPUs, RAM?) do you have access to for the matrix profile computation?
At the end of the day, you'll still need to read all of the files (either at once or in separate batches) and so I/O will need to happen regardless. The advice would vary depending on the size of the data and the hardware
This may be a crude approximation but you run the risk of completely missing a pattern, located at the very beginning of your time series, that match a subsequence at the end of the time series. Instead, you may be better off sub-sampling the full time series first (i.e., reading in only every 10th or 100th data point) and computing the matrix profile using the lower frequency/resolution data in order to get an idea of whether ANYTHING interesting exists. This approach can often help you discover the potential motifs in a very computationally effective way. Additionally, you can combine this down-sampled approach with computing a (somewhat cheaper) approximate matrix profile (see scrump function) |
Beta Was this translation helpful? Give feedback.
@datadeng46 If the pattern is THAT well conserved and you are mostly interested in anomalies that don't look like the pattern, then instead of computing the full matrix profile, you might simply consider taking the known pattern and computing the distance profile using the (mass function)[https://stumpy.readthedocs.io/en/latest/api.html#stumpy.mass].