DL2vec is a method that can convert different types of Description Logic axioms into graph representation, and then generate an embedding for each node and edge type.
The main covnertion tool is in DL2vec_embed folder.
python >= 3.4
pandas >=0.24.2
numpy >= 1.16.2
torch >= 1.0.1
networkx >= 2.3
scipy >= 1.2.1
scikit-learn >=0.20.3
Groovy (Groovy Version: 2.4.10 JVM: 1.8.0_121) with Grape for dependency management.
Details for predicting gene-disease associations with DL2Vec can be found in the experiment folder.
-
Git clone this repository
git clone https://github.com/bio-ontology-research-group/DL2Vec.git
-
Generate the embeddings for each entity:
python runDL2vec.py -ontology "ontology file" -associations "association_file" -outfile "embedding output file" -entity_list "entities list need generating embeddding"
Where the following are mandatory arguments:
ontology_file
: ontology file contains ontology in OWL formatassociation_file
: file contains entity-class associationsoutfile
: output file contains the embedding model
If one of these two mandatory files is missing, an error message will be displayed.
You can also specify the following optional arguments:
-h/--help
show the help message and exit-window_size
: window size for Word2Vec-mincount
minimum count value for Word2Vec-entity_list
: the entity file in which each entity need to start the random walk and generate the embedding
The script will save a model that can generate embeddings for each entity.
If you find DL2vec useful for your research, please cite:
Jun Chen, Azza Althagafi, and Robert Hoehndorf. "Predicting candidate genes from phenotypes, functions, and anatomical site of expression." (2020). DOI: 10.1093/bioinformatics/btaa879
We use two scripts which, ProcessOntology.groovy and getMetadata.groovy that are adapted from OPA2vec.