Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chunk extraction and subsequent GBWT generation #4436

Merged
merged 9 commits into from
Nov 7, 2024
Merged

Conversation

adamnovak
Copy link
Member

@adamnovak adamnovak commented Nov 5, 2024

Changelog Entry

To be copied to the draft changelog by merger:

  • vg chunk and vg find now generate subpaths with subrange metadata when cutting up paths.
  • vg gbwt will accept subranges on fragment 0 and discard the fragment number.

Description

This should let you do:

wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-chm13/hprc-v1.1-mc-chm13.chroms/chr20.d9.vg
vg snarls -t8 -a chr20.d9.vg > chr20.d9.snarls # get snarls
vg chunk -x chr20.d9.vg -p CHM13#chr20:5000-15000 --snarls chr20.d9.snarls > chr20_10k.vg # get the subgraph
vg gbwt -E -x chr20_10k.vg -o chr20_10k.gbwt

But it changes a lot of how chunk extraction on reference paths names things, so it might break a bunch of CLI tests. And it needs some unit tests to make sure that subpath start offsets are accurately computed.

@adamnovak adamnovak marked this pull request as ready for review November 6, 2024 17:54
@adamnovak adamnovak merged commit 62ccb55 into master Nov 7, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant