Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Austenem/CAT-983 MVP Bulk File Download #3604

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

austenem
Copy link
Collaborator

Summary

First iteration of the Bulk Download dialog, which adds a dialog to the Search, Publication, and Collection pages that generates a download manifest for files from selected datasets. An option to download the corresponding dataset metadata is also included.

In this iteration, all files from selected datasets must be downloaded; in an upcoming iteration, the dialog will include an "Advanced Selection" accordion that will allow users to select specific file types from centrally processed datasets.

Design Documentation/Original Tickets

CAT-983 Jira ticket

Figma mockups

Testing

Tested the following cases:

  • General

    • Dialog is available for authenticated and non-authenticated users
    • Download manifest is formatted appropriately for CLT tool and contains relevant datasets
    • Metadata download option works and resulting file contains relevant datasets
    • One or more download options must be selected before download is permitted
  • Search page

    • Select one or more dataset from each of the download options (centrally processed, externally processed, raw) - should show one download option, and no "select all" option
    • Select datasets from all three download options - should show all download options and a "select all" option
    • Select protected datasets - should show warning banner and not allow a download until the datasets are removed. If only protected datasets were selected and subsequently removed, a "no files to download" warning banner should appear.
    • Select no datasets - should include all datasets based on the current filters
  • Collection pages

    • Open dialog and download datasets from table
  • Publication pages

    • Open dialog and download datasets from table

Screenshots/Video

Search

Screenshot 2024-11-13 at 3 18 32 PM

Screen.Recording.2024-11-13.at.3.22.05.PM.mov
Collections

Screenshot 2024-11-13 at 3 16 46 PM

Publications

Screenshot 2024-11-13 at 3 14 18 PM

Error cases

No available files:

Screenshot 2024-11-13 at 3 35 10 PM

Protected datasets selected:

Screenshot 2024-11-13 at 3 48 59 PM

Download failure:

Screenshot 2024-11-13 at 3 39 16 PM

Checklist

  • Code follows the project's coding standards
    • Lint checks pass locally
    • New CHANGELOG-your-feature-name-here.md is present in the root directory, describing the change(s) in full sentences.
  • Unit tests covering the new feature have been added
  • All existing tests pass
  • Any relevant documentation in JIRA/Confluence has been updated to reflect the new feature
  • Any new functionalities have appropriate analytics functionalities added

Copy link
Collaborator

@NickAkhmetov NickAkhmetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some minor notes to begin; I will pull the branch down to test more thoroughly tomorrow.

context/app/static/js/components/workspaces/formHooks.ts Outdated Show resolved Hide resolved
context/app/static/js/components/workspaces/formHooks.ts Outdated Show resolved Hide resolved
context/app/static/js/components/bulkDownload/hooks.ts Outdated Show resolved Hide resolved
Comment on lines +66 to +70
const datasetQuery = {
query: getIDsQuery([...uuids]),
_source: ['hubmap_id', 'processing', 'uuid', 'files', 'processing_type'],
size: 1000,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems potentially worthwhile to put this into a useMemo to avoid recalculating it unless the uuids change.

This raised some other questions for me:

  • Is there an upper bound to how many datasets can be selected for bulk download at once?
    • If so, are we communicating this to users?
  • If the limit is higher than 1,000, could we increase the size to 10_000 to handle selections of >1,000 datasets? We could also use size: uuids.size().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point and I like the suggestion of using size: uuids.size() - in regard to an upper bound, we haven't discussed this as far as I know. The load time for the dialog is definitely noticeable after a few hundred datasets, so it seems like some additional messaging should be provided to users who have selected a large number of datasets (maybe an alert similar to the one used in the workspaces dialog) letting them know about the wait time. @tsliaw what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@austenem I think adding an alert similar to what we have in workspaces will be a good idea. Do you have an approximate idea of when wait time gets impacted (number of datasets this occurs at + approximate wait time when this does happen)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cause of the loading time is client side, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-conroy It's server side - once the search hits are returned the dialog loads almost immediately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsliaw @john-conroy I tried timing the wait time and it varies pretty substantially depending on what filters are applied - I'm guessing this is because of how the datasets are indexed?

For filters that include ~1000 datasets, the wait time is < 2 seconds for filters on dataset type, status, and analyte class, ~30 seconds for filters on sample category, and just over two minutes for filters on organs.

context/app/static/js/components/bulkDownload/hooks.ts Outdated Show resolved Hide resolved
context/app/static/js/components/bulkDownload/hooks.ts Outdated Show resolved Hide resolved
@NickAkhmetov
Copy link
Collaborator

One minor issue I noticed while testing: the "download" button as soon as the page loads shows a "files are not available" view:

Screen.Recording.2024-11-14.134655.mp4

Perhaps we could disable the "download" button and show a loading message on hover if

  • All UUIDS have not yet loaded, and
  • There is no selection

@austenem
Copy link
Collaborator Author

One minor issue I noticed while testing: the "download" button as soon as the page loads shows a "files are not available" view:

Screen.Recording.2024-11-14.134655.mp4

Perhaps we could disable the "download" button and show a loading message on hover if

  • All UUIDS have not yet loaded, and
  • There is no selection

Good idea! It's now disabled during that initial loading period.

Copy link
Collaborator

@john-conroy john-conroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A couple of thoughts as to how we could avoid iterating over the datasets as many times which is likely contributing to the slowness.

}

return (
<Box>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above regarding the Box.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Box here is needed to prevent the spacing style from the parent Stack from applying to the children of the Step component.

// Which options to show in the dialog
const downloadOptions = ALL_BULK_DOWNLOAD_OPTIONS.map((option) => ({
...option,
count: datasets.filter((dataset) => option.isIncluded(dataset)).length,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially push this logic into the search-api call by using a composite aggregations query with processing and processing_type. Or we could make separate requests to get the datasets for each option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickAkhmetov thoughts?

Comment on lines +146 to +150
const datasetsToDownload = datasets.filter((dataset) =>
bulkDownloadOptions.some((option) =>
ALL_BULK_DOWNLOAD_OPTIONS.find(({ key }) => key === option)?.isIncluded(dataset),
),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be avoided by separately requesting each option? Or maybe at the least we can iterate over the datasets once to separate them to avoid doing it here and counting them above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants