You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the Plutokiller dataset (1000+ files) takes 45 seconds:
25 seconds are spent tokenizing
20 seconds are spent calculating all pairs
During pair calculation, we create all $\Theta(n^2)$ pairs and for each pair we perform a longest-common-substring algorithm to search for the longest fragment, this is an $\Theta(n^2)$ implementation in itself.
Disabling the LCS calculation shaves down the time spent calculating the pairs to 50 ms. As we don't use the LCS for the initial visualizations, we could easily skip this step.
However, we will need the LCS if the user would go to the pairs (or similar) page, so there could be a worker that starts calculating all the LCS' once the user opens the browser.
Changes needed
Lib: Do not calculate the LCS when constructing a new pair. This could be easily implemented without API changes by making longest a getter that is lazily calculated.
CLI: Disable writing the longest fragment to the output CSV. As this is a change of the CLI API, we likely want this to be an option that can be enabled again.
Web: change the Pair model and store to reflect this change.
We still want to support CSV's with the longest fragment calculated. We already approach lazily calculating fragments in a similar way by only calculating them when they are required.
We cannot wait for calculating the LCS until it is needed, because we do need all of them at once and that can take 20 seconds. Once the data is loaded, we can start a worker in the background that calculates the LCS (using the methods already provided by dolos-core.
Implement loading states for components that do use the longest fragment, as the user can possibly still go to these pages.
Optionally: we could cache the LCS data using localstorage in order to prevent this data from being recalculated every time the page is loaded again.
The text was updated successfully, but these errors were encountered:
Running the
Plutokiller
dataset (1000+ files) takes 45 seconds:During pair calculation, we create all$\Theta(n^2)$ pairs and for each pair we perform a longest-common-substring algorithm to search for the longest fragment, this is an $\Theta(n^2)$ implementation in itself.
Disabling the LCS calculation shaves down the time spent calculating the pairs to 50 ms. As we don't use the LCS for the initial visualizations, we could easily skip this step.
However, we will need the LCS if the user would go to the pairs (or similar) page, so there could be a worker that starts calculating all the LCS' once the user opens the browser.
Changes needed
longest
a getter that is lazily calculated.Pair
model and store to reflect this change.fragment
s in a similar way by only calculating them when they are required.dolos-core
.The text was updated successfully, but these errors were encountered: