-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Austenem/CAT-983 MVP Bulk File Download #3604
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Some minor notes to begin; I will pull the branch down to test more thoroughly tomorrow.
context/app/static/js/components/bulkDownload/bulkDownloadFormFields.ts
Outdated
Show resolved
Hide resolved
const datasetQuery = { | ||
query: getIDsQuery([...uuids]), | ||
_source: ['hubmap_id', 'processing', 'uuid', 'files', 'processing_type'], | ||
size: 1000, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems potentially worthwhile to put this into a useMemo
to avoid recalculating it unless the uuids change.
This raised some other questions for me:
- Is there an upper bound to how many datasets can be selected for bulk download at once?
- If so, are we communicating this to users?
- If the limit is higher than 1,000, could we increase the size to
10_000
to handle selections of >1,000 datasets? We could also usesize: uuids.size()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point and I like the suggestion of using size: uuids.size()
- in regard to an upper bound, we haven't discussed this as far as I know. The load time for the dialog is definitely noticeable after a few hundred datasets, so it seems like some additional messaging should be provided to users who have selected a large number of datasets (maybe an alert similar to the one used in the workspaces dialog) letting them know about the wait time. @tsliaw what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@austenem I think adding an alert similar to what we have in workspaces will be a good idea. Do you have an approximate idea of when wait time gets impacted (number of datasets this occurs at + approximate wait time when this does happen)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cause of the loading time is client side, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-conroy It's server side - once the search hits are returned the dialog loads almost immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsliaw @john-conroy I tried timing the wait time and it varies pretty substantially depending on what filters are applied - I'm guessing this is because of how the datasets are indexed?
For filters that include ~1000 datasets, the wait time is < 2 seconds for filters on dataset type, status, and analyte class, ~30 seconds for filters on sample category, and just over two minutes for filters on organs.
One minor issue I noticed while testing: the "download" button as soon as the page loads shows a "files are not available" view: Screen.Recording.2024-11-14.134655.mp4Perhaps we could disable the "download" button and show a loading message on hover if
|
Good idea! It's now disabled during that initial loading period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A couple of thoughts as to how we could avoid iterating over the datasets as many times which is likely contributing to the slowness.
context/app/static/js/components/bulkDownload/BulkDownloadDialog/BulkDownloadDialog.tsx
Show resolved
Hide resolved
context/app/static/js/components/bulkDownload/BulkDownloadDialog/BulkDownloadDialog.tsx
Show resolved
Hide resolved
context/app/static/js/components/bulkDownload/BulkDownloadDialog/BulkDownloadDialog.tsx
Show resolved
Hide resolved
} | ||
|
||
return ( | ||
<Box> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above regarding the Box
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Box
here is needed to prevent the spacing
style from the parent Stack
from applying to the children of the Step
component.
.../app/static/js/components/bulkDownload/BulkDownloadOptionsField/BulkDownloadOptionsField.tsx
Show resolved
Hide resolved
// Which options to show in the dialog | ||
const downloadOptions = ALL_BULK_DOWNLOAD_OPTIONS.map((option) => ({ | ||
...option, | ||
count: datasets.filter((dataset) => option.isIncluded(dataset)).length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially push this logic into the search-api call by using a composite aggregations query with processing
and processing_type
. Or we could make separate requests to get the datasets for each option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickAkhmetov thoughts?
const datasetsToDownload = datasets.filter((dataset) => | ||
bulkDownloadOptions.some((option) => | ||
ALL_BULK_DOWNLOAD_OPTIONS.find(({ key }) => key === option)?.isIncluded(dataset), | ||
), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be avoided by separately requesting each option? Or maybe at the least we can iterate over the datasets once to separate them to avoid doing it here and counting them above?
Summary
First iteration of the Bulk Download dialog, which adds a dialog to the Search, Publication, and Collection pages that generates a download manifest for files from selected datasets. An option to download the corresponding dataset metadata is also included.
In this iteration, all files from selected datasets must be downloaded; in an upcoming iteration, the dialog will include an "Advanced Selection" accordion that will allow users to select specific file types from centrally processed datasets.
Design Documentation/Original Tickets
CAT-983 Jira ticket
Figma mockups
Testing
Tested the following cases:
General
Search page
Collection pages
Publication pages
Screenshots/Video
Search
Screen.Recording.2024-11-13.at.3.22.05.PM.mov
Collections
Publications
Error cases
No available files:
Protected datasets selected:
Download failure:
Checklist
CHANGELOG-your-feature-name-here.md
is present in the root directory, describing the change(s) in full sentences.