Austenem/CAT-983 MVP Bulk File Download #3604

austenem · 2024-11-13T20:55:18Z

Summary

First iteration of the Bulk Download dialog, which adds a dialog to the Search, Publication, and Collection pages that generates a download manifest for files from selected datasets. An option to download the corresponding dataset metadata is also included.

In this iteration, all files from selected datasets must be downloaded; in an upcoming iteration, the dialog will include an "Advanced Selection" accordion that will allow users to select specific file types from centrally processed datasets.

Design Documentation/Original Tickets

CAT-983 Jira ticket

Figma mockups

Testing

Tested the following cases:

General
- Dialog is available for authenticated and non-authenticated users
- Download manifest is formatted appropriately for CLT tool and contains relevant datasets
- Metadata download option works and resulting file contains relevant datasets
- One or more download options must be selected before download is permitted
Search page
- Select one or more dataset from each of the download options (centrally processed, externally processed, raw) - should show one download option, and no "select all" option
- Select datasets from all three download options - should show all download options and a "select all" option
- Select protected datasets - should show warning banner and not allow a download until the datasets are removed. If only protected datasets were selected and subsequently removed, a "no files to download" warning banner should appear.
- Select no datasets - should include all datasets based on the current filters
Collection pages
- Open dialog and download datasets from table
Publication pages
- Open dialog and download datasets from table

Screenshots/Video

Search

Screen.Recording.2024-11-13.at.3.22.05.PM.mov

Collections

Publications

Error cases

No available files:

Protected datasets selected:

Download failure:

Checklist

Code follows the project's coding standards
- Lint checks pass locally
- New CHANGELOG-your-feature-name-here.md is present in the root directory, describing the change(s) in full sentences.
Unit tests covering the new feature have been added
All existing tests pass
Any relevant documentation in JIRA/Confluence has been updated to reflect the new feature
Any new functionalities have appropriate analytics functionalities added

NickAkhmetov

Great work! Some minor notes to begin; I will pull the branch down to test more thoroughly tomorrow.

context/app/static/js/components/bulkDownload/bulkDownloadFormFields.ts

context/app/static/js/components/workspaces/formHooks.ts

context/app/static/js/components/bulkDownload/hooks.ts

NickAkhmetov · 2024-11-13T21:34:26Z

context/app/static/js/components/bulkDownload/hooks.ts

+  const datasetQuery = {
+    query: getIDsQuery([...uuids]),
+    _source: ['hubmap_id', 'processing', 'uuid', 'files', 'processing_type'],
+    size: 1000,
+  };


It seems potentially worthwhile to put this into a useMemo to avoid recalculating it unless the uuids change.

This raised some other questions for me:

Is there an upper bound to how many datasets can be selected for bulk download at once?

If so, are we communicating this to users?

If the limit is higher than 1,000, could we increase the size to 10_000 to handle selections of >1,000 datasets? We could also use size: uuids.size().

This is a good point and I like the suggestion of using size: uuids.size() - in regard to an upper bound, we haven't discussed this as far as I know. The load time for the dialog is definitely noticeable after a few hundred datasets, so it seems like some additional messaging should be provided to users who have selected a large number of datasets (maybe an alert similar to the one used in the workspaces dialog) letting them know about the wait time. @tsliaw what do you think?

@austenem I think adding an alert similar to what we have in workspaces will be a good idea. Do you have an approximate idea of when wait time gets impacted (number of datasets this occurs at + approximate wait time when this does happen)?

The cause of the loading time is client side, right?

@john-conroy It's server side - once the search hits are returned the dialog loads almost immediately.

@tsliaw @john-conroy I tried timing the wait time and it varies pretty substantially depending on what filters are applied - I'm guessing this is because of how the datasets are indexed?

For filters that include ~1000 datasets, the wait time is < 2 seconds for filters on dataset type, status, and analyte class, ~30 seconds for filters on sample category, and just over two minutes for filters on organs.

context/app/static/js/components/bulkDownload/hooks.ts

NickAkhmetov · 2024-11-14T18:50:02Z

One minor issue I noticed while testing: the "download" button as soon as the page loads shows a "files are not available" view:

Screen.Recording.2024-11-14.134655.mp4

Perhaps we could disable the "download" button and show a loading message on hover if

All UUIDS have not yet loaded, and
There is no selection

austenem · 2024-11-15T18:39:15Z

One minor issue I noticed while testing: the "download" button as soon as the page loads shows a "files are not available" view:

Screen.Recording.2024-11-14.134655.mp4

Perhaps we could disable the "download" button and show a loading message on hover if

All UUIDS have not yet loaded, and

There is no selection

Good idea! It's now disabled during that initial loading period.

john-conroy

Looks great! A couple of thoughts as to how we could avoid iterating over the datasets as many times which is likely contributing to the slowness.

context/app/static/js/components/bulkDownload/BulkDownloadDialog/BulkDownloadDialog.tsx

john-conroy · 2024-11-15T19:41:55Z

context/app/static/js/components/bulkDownload/BulkDownloadDialog/BulkDownloadDialog.tsx

+  }
+
+  return (
+    <Box>


Same question as above regarding the Box.

The Box here is needed to prevent the spacing style from the parent Stack from applying to the children of the Step component.

.../app/static/js/components/bulkDownload/BulkDownloadOptionsField/BulkDownloadOptionsField.tsx

john-conroy · 2024-11-15T19:48:55Z

context/app/static/js/components/bulkDownload/hooks.ts

+  // Which options to show in the dialog
+  const downloadOptions = ALL_BULK_DOWNLOAD_OPTIONS.map((option) => ({
+    ...option,
+    count: datasets.filter((dataset) => option.isIncluded(dataset)).length,


We could potentially push this logic into the search-api call by using a composite aggregations query with processing and processing_type. Or we could make separate requests to get the datasets for each option.

@NickAkhmetov thoughts?

john-conroy · 2024-11-15T19:52:48Z

context/app/static/js/components/bulkDownload/hooks.ts

+      const datasetsToDownload = datasets.filter((dataset) =>
+        bulkDownloadOptions.some((option) =>
+          ALL_BULK_DOWNLOAD_OPTIONS.find(({ key }) => key === option)?.isIncluded(dataset),
+        ),
+      );


This could also be avoided by separately requesting each option? Or maybe at the least we can iterate over the datasets once to separate them to avoid doing it here and counting them above?

austenem added 30 commits October 30, 2024 11:26

add modal entry points

99c1736

add modal infrastructure

2d11c5d

add form options

2bd6009

continue form logic

f50fb2e

fix loading issue

a368f43

pass in datasets from publication page

5ccddb5

implement manifest functionality

d616b20

add filtering by dataset type

c139813

add metadata download functionality

c51dc1e

integrate with search page

cdd6d69

switch to uuid query

47099a3

add info to dialog

f4e7f15

add advanced selections accordion

7eb079c

use download button component

e6ec0ba

implement checkboxes and more form functionality

eb6c5dd

fix select all option

34a14aa

add alert

73b15c6

add default errors

fce7a12

fix query response time

8fc7d14

add download selection section

6ff2a12

fix error issue

3b60355

add toggle component

26b0356

add protected datasets logic

fb38840

separate protected dataset section

37f1776

add dataset counts

5a00860

update language and styles

38591b4

update download functions

c3cbcc9

adjust switch

0cfbedb

select all datasets functionality

b3f362c

update error toasts

a7f4703

austenem added 8 commits November 12, 2024 14:14

add success toasts and update tsv download

3fa2492

update file downloads

c9264fc

consolidate files and toasts

1b044af

fix protected datasets bug

4fb7c15

add retry function option

bb0e905

clean up hooks and dialog

61559df

last round of cleanup

c0c156e

add changelog

2a1538a

austenem requested review from tsliaw, NickAkhmetov and john-conroy November 13, 2024 20:55

NickAkhmetov reviewed Nov 13, 2024

View reviewed changes

first round of review changes

8b30019

disable button during load

7d057d6

john-conroy reviewed Nov 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Austenem/CAT-983 MVP Bulk File Download #3604

Austenem/CAT-983 MVP Bulk File Download #3604

austenem commented Nov 13, 2024

NickAkhmetov left a comment

NickAkhmetov Nov 13, 2024

austenem Nov 13, 2024

tsliaw Nov 15, 2024

john-conroy Nov 15, 2024

austenem Nov 15, 2024

austenem Nov 15, 2024

NickAkhmetov commented Nov 14, 2024

austenem commented Nov 15, 2024

john-conroy left a comment

john-conroy Nov 15, 2024

austenem Nov 15, 2024

john-conroy Nov 15, 2024

john-conroy Nov 15, 2024

john-conroy Nov 15, 2024

Austenem/CAT-983 MVP Bulk File Download #3604

Are you sure you want to change the base?

Austenem/CAT-983 MVP Bulk File Download #3604

Conversation

austenem commented Nov 13, 2024

Summary

Design Documentation/Original Tickets

Testing

Screenshots/Video

Checklist

NickAkhmetov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickAkhmetov commented Nov 14, 2024

austenem commented Nov 15, 2024

john-conroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment