Skip to content

Using COVID19 related articles with ontology annotations to build a knowledge graph in Neo4j..

License

Notifications You must be signed in to change notification settings

neobernad/covid19-knowledge-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID19 Knowledge graph

This project takes the annotated dataset from CORD19 project. Original (non-annotated) dataset CORD19 is found in this website.

Our work is part of the Knowledge Graph task for the virtual COVID-19 Biohackathon.

TOC

Mapping the JSON schema into LPG

The dataset JSON schema is the following. Our current model is:

JSON schema to LPG

Node properties

Each node has the following properties and labels (node_properties.csv):

nodeType nodeLabels propertyName propertyTypes mandatory
:Author [Author] middle [StringArray] true
:Author [Author] last [String] true
:Author [Author] suffix [String] true
:Author [Author] first [String] true
:Author [Author] email [String] false
:Term [Term] id [String] true
:Term [Term] name [String] true
:Ontology [Ontology] name [String] true
:Paragraph [Paragraph] id [String] true
:Paragraph [Paragraph] position [Long] true
:Paragraph [Paragraph] text [String] true
:Paragraph [Paragraph] section [String] true
:CiteSpan [CiteSpan] text [String] true
:CiteSpan [CiteSpan] start [Long] true
:CiteSpan [CiteSpan] end [Long] true
:CiteSpan [CiteSpan] ref_id [String] false
:RefSpan [RefSpan] text [String] true
:RefSpan [RefSpan] start [Long] true
:RefSpan [RefSpan] end [Long] true
:RefSpan [RefSpan] ref_id [String] false
:BibEntry [BibEntry] id [String] true
:BibEntry [BibEntry] title [String] true
:BibEntry [BibEntry] volume [String] true
:BibEntry [BibEntry] venue [String] true
:BibEntry [BibEntry] pages [String] true
:BibEntry [BibEntry] issn [String] true
:BibEntry [BibEntry] year [Long] false
:RefEntry [RefEntry] id [String] true
:RefEntry [RefEntry] text [String] true
:RefEntry [RefEntry] type [String] true
:DOI [DOI] id [String] true
:Article [Article] id [String] true
:Article [Article] title [String] true
:Article [Article] license [String] true

Edges properties

Each edge has the following properties (edges_properties.csv):

relType propertyName propertyTypes mandatory
:has_author null null false
:has_metadata_annotations hit_sentence_start [Long] true
:has_metadata_annotations hit_sentence_end [Long] true
:has_metadata_annotations hit_sentences [Long] true
:from_ontology null null false
:has_annotations hit_sentence_start [Long] true
:has_annotations hit_sentence_end [Long] true
:has_annotations hit_sentences [Long] true
:has_abstract null null false
:has_bodytext null null false
:contains_citespan null null false
:contains_refspan null null false
:has_bibentry null null false
:has_refentry null null false
:has_back_matter null null false
:has_reference null null false

Cypher queries

Retrieve relevant terms for a specific input, e.g. as typeahead.

CALL db.index.fulltext.queryNodes('termIndex', '<Input>~')
YIELD node, score
RETURN node ORDER BY score DESC SKIP <PAGE> LIMIT <PAGE_SIZE>;

Example for querying Terms

Fetch articles and their abstracts annotated with a specific term.

Match (t:Term {id:"<TermID>"})<-[*..2]-(a:Article)
WITH a
MATCH (a)-[:has_abstract]->(abstract:Paragraph)
RETURN a, abstract SKIP <PAGE> LIMIT <PAGE_SIZE>;

Example retrieving an article for a term

Retrieve all terms appearing in an article.

Match (t:Term)<-[r:has_annotations]-(p:Paragraph)<-[*..2]-(a:Article {id:"<ArticleID>"})
RETURN t  SKIP <PAGE> LIMIT <PAGE_SIZE>;

Example retrieving all terms for an article

Retrieve the context of the annotation.

Match (t:Term {id:"<TermID>"})<-[r:has_annotations]-(p:Paragraph)<-[*..2]-(a:Article {id:"<ArticleID>"})
RETURN t, p, COLLECT(r) as spans ORDER BY p.position ASC SKIP <PAGE> LIMIT <PAGE_SIZE>;

Example retrieving the context of an annotation

Retrieve the article title, authors, abstract, bodytext, backmatter and bibliography.

MATCH (article:Article {id:"<ArticleID>"})
WITH article
MATCH (author:Author)<-[:has_author]-(article)
WITH article, COLLECT(author) as authors
MATCH (abstract:Paragraph)<-[:has_abstract]-(article)
WITH article, authors, COLLECT(abstract) as abstracts
MATCH (bodytext:Paragraph)<-[:has_bodytext]-(article)
WITH article, authors, abstracts, bodytext
ORDER BY bodytext.position ASC
WITH article, authors, abstracts, COLLECT(bodytext) as bodytexts
MATCH (backmatter:Paragraph)<-[:has_back_matter]-(article)
WITH article, authors, abstracts, bodytexts, COLLECT(backmatter) as backmatters
MATCH (bibEntry:BibEntry)<-[:has_bibentry]-(article)
RETURN article, authors, abstracts, bodytexts, backmatters, COLLECT(bibEntry) as bibEntries

Example retrieving the information from an article

Retrieve query preview: article count and terms related to an input term list.

Match (t:Term)<-[*..2]-(a:Article)
WHERE (t.id IN ["<TermID1>", "<TermID2>"])
WITH a, COUNT(a) as articleCount
MATCH (a)-[:has_metadata_annotations]->(annotation:Term)-[:from_ontology]->(o:Ontology)
RETURN sum(articleCount) as totalArticles, o as ontology, collect(distinct(annotation)) as annotations

The latter query the article count is divided by the ontology: X articles from Y ontology with Z annotations. If the ontology is not necessary, you can omit o as ontology, so that the output is: X total articles appearing the Z annotations found.

Match (t:Term)<-[*..2]-(a:Article)
WHERE (t.id IN ["<TermID1>", "<TermID2>"])
WITH a, COUNT(a) as articleCount
MATCH (a)-[:has_metadata_annotations]->(annotation:Term)
RETURN sum(articleCount) as totalArticles, collect(distinct(annotation)) as annotations

About

Using COVID19 related articles with ontology annotations to build a knowledge graph in Neo4j..

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages