Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping track of disk usage #27

Open
manics opened this issue Nov 5, 2019 · 4 comments
Open

Keeping track of disk usage #27

manics opened this issue Nov 5, 2019 · 4 comments

Comments

@manics
Copy link

manics commented Nov 5, 2019

omero fs usage fails due to the size of this submission.
du -h on the filesystem takes ages.
To save time in future here's the usage for each subdirectory on prod71:

$ du --max-depth=1 /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/
40258824        /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20180320
26706844        /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20180518-ftp
6661577884      /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20180825-ftp
6540063740      /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20180831-ftp
12909481104     /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20181112-ftp
15154773288     /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20180624-ftp
13030501160     /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20181129-ftp
12909757672     /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20190109-ftp
12855757676     /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20190121-ftp
12897329832     /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/20190610-ftp
93026208028     /uod/idr/filesets/idr0043-uhlen-humanproteinatlas/
$ for f in idr0043-uhlen-humanproteinatlas/*; do echo $f; find $f -type f | wc; done
idr0043-uhlen-humanproteinatlas/20180320
   1926    1926  134820
idr0043-uhlen-humanproteinatlas/20180518-ftp
   1009    1009   67538
idr0043-uhlen-humanproteinatlas/20180624-ftp
 504302  504302 32630524
idr0043-uhlen-humanproteinatlas/20180825-ftp
 251142  251142 17077656
idr0043-uhlen-humanproteinatlas/20180831-ftp
 246781  246781 16781099
idr0043-uhlen-humanproteinatlas/20181112-ftp
 487535  487535 33152359
idr0043-uhlen-humanproteinatlas/20181129-ftp
 492290  492290 33475699
idr0043-uhlen-humanproteinatlas/20190109-ftp
 488433  488433 33213435
idr0043-uhlen-humanproteinatlas/20190121-ftp
 486637  486637 33091307
idr0043-uhlen-humanproteinatlas/20190610-ftp
 488187  488187 33073921
idr0043-uhlen-humanproteinatlas/test.txt
      1       1      41

total: 3448243 files

@sbesson
Copy link
Member

sbesson commented Nov 5, 2019

👍 for tracking this as a GitHub issue given the current fs usage limtitation. Following comments IDR/idr.openmicroscopy.org#73 (review), is it worth transform the issue into a table that we keep up to date ? We might also need a column including the folder/run mapping as this includes a combination of preliminary/published/unpublished data

@manics
Copy link
Author

manics commented Nov 6, 2019

I think the ideal option is a table in this repo and a script that updates the table, ignoring existing rows (so it only looks at the new data). For now can we use this as a holding issue?

@manics
Copy link
Author

manics commented Nov 6, 2019

$ gzcat hpa_run_01/idr0043-experimentA-filePaths.tsv.gz | cut -f2 | cut -d/ -f 1-7 | sort -u > hpadirs.txt

$ cat hpa_run_0?/idr0043-experimentA-filePaths.tsv | cut -f2 >> hpadirs.txt

$ wc hpadirs.txt
    6000    6000  414000 hpadirs.txt

$ du -hsc $(cat hpadirs.txt) | tee hpadirs-duhsc.txt

hpadirs.txt

hpadirs-duhsc.txt

Note if rerunning this use du -sc instead of du -hsc to get everything in KB

manics added a commit to manics/idr.openmicroscopy.org that referenced this issue Nov 6, 2019
@manics
Copy link
Author

manics commented Nov 6, 2019

Re-run with kilobytes:
du -sc $(cat hpadirs.txt) | tee hpadirs-dusc.txt

hpadirs-dusc.txt

manics added a commit to manics/idr.openmicroscopy.org that referenced this issue Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants