This project implements a knowledge graph framework for representing disease outbreak scenarios. The knowledge graph is built by processing disease outbreak alerts from ProMED and other sources and combines this with ontological information to create a structured representation of outbreak events.
The KG builds on the following sources:
- Outbreak alerts: text from outbreak alerts are processed and
terms representing diseases/phenotypes/pathogens/symptoms/geolocations
are automatically extracted using the Gilda system. This produces
mentioned_in
relationships. - Individual alerts are grouped by the inferred outbreak that they belong to, represented
as a
has_outbreak
relationship. - Taxonomy of diseases: extracted from the Medical Subject Headings tree structure
- Taxonomy of pathogens: extracted from the Medical Subject Headings tree structure
- Taxonomy of geolocations: extracted from the Medical Subject Headings tree structure
- Additional taxonomy of geolocations: extracted from the Geonames dataset with prefix
geonames
- Geolocation relations: geolocations are linked together to represent hierarchical inclusion
using the
isa
relationship where the subsumed region is the source of the relationship and the containing region is the target. - Pathogen-disease relations: relationships representing the fact that a pathogen causes a disease
are represented as
has_pathogen
relationships. - Disease-phenotype/symptom relations: relationships representing the fact that a disease
causes a phenotype/symptom are represented as
has_phenotype
relationships. - Development/health indicators: extracted from WDI data, geolocations are
linked to indicators with the
has_indicator
relationship.
The knowledge graph is represented as a set of nodes and edges. Nodes represent entities such as diseases, pathogens, geolocations, and phenotypes. Edges represent relationships between these entities. The knowledge graph is deployed in a Neo4j database.
The knowledge graph can be queried using the Cypher query language directly through Neo4j. The repository also provides a Python client and a REST API for querying the knowledge graph. The graph database and the surrounding REST API are Dockerized and deployed on AWS.
This work was supported by CAPTRS.