This library implements a novel method for mapping MaveDB scoreset data to GA4GH Variation Representation Specification (VRS) objects, enhancing interoperability for genomic medicine applications. See Arbesfeld et. al. (2023) for a preprint edition of the mapping manuscript, or download the resulting mappings directly.
- Universal Transcript Archive (UTA): see README for setup instructions. Users with access to Docker on their local devices can use the available Docker image; otherwise, start a relatively recent (version 14+) PostgreSQL instance and add data from the available database dump.
- SeqRepo: see README for setup instructions. The SeqRepo data directory must be writeable; see specific instructions here for more.
- Gene Normalizer: see documentation for data setup instructions.
- blat: Must be available on the local PATH and executable by the user. Otherwise, its location can be set manually with the
BLAT_BIN_PATH
env var. See the UCSC Genome Browser FAQ for download instructions.
Install from PyPI:
python3 -m pip install dcd-mapping
Use the dcd-map
command with a scoreset URN, eg
$ dcd-map urn:mavedb:00000083-c-1
Output is saved in the format <URN>_mapping_results_<ISO datetime>.json
in the directory specified by the environment variable MAVEDB_STORAGE_DIR
, or ~/.local/share/dcd-mapping
by default.
Use dcd-map --help
to see other available options.
Notebooks for manuscript data analysis and figure generation are provided within notebooks/analysis
. See notebooks/analysis/README.md
for more information.
Clone the repo
git clone https://github.com/ave-dcd/dcd_mapping
cd dcd_mapping
Create and activate a virtual environment
python3 -m virtualenv venv
source venv/bin/activate
Install as editable and with developer dependencies
python3 -m pip install -e '.[dev,tests]'
Add pre-commit hooks
pre-commit install
Run tests with pytest
pytest