INDRA-SPINE is an adaptation of INDRA (Integrated Network and Dynamical Reasoning Assembler), which is an automated knowledge assembly system that can be used to traverse PubMed and read thousands of papers using natural language processing. It can then be used to create executable or graphical representations of biochemical pathways assembled from multiple publications.
INDRA-SPINE searches neuroscience literature for neuroanatomical relations at scale, taking into account context within a sentence.
INDRA-SPINE uses REACH, developed by the CLU lab at the University of Arizona, and Odinson, developed by Lum AI. Both of these repositories can be cloned prior to using INDRA-SPINE. The user should set the REACH_HOME environmental variable with the path to their local copy of the REACH repository.
Using Odinson requires an Intel installation of Java, which can be found at this link.
REACH and Odinson need to be customized prior to using INDRA-SPINE. This can be done following these steps:
Step One
INDRA-SPINE's resources folder contains two tsv files: spine.tsv
and neuro-behavior.tsv
. These should both be added to the src/main/resources/org/clulab/reach/kb
folder within the local version of REACH.
Step Two
A new entry for each then needs to be added to src/main/resources/application.conf
under the KnowledgeBases
block.
Step Three
Two new entity types need to be added to the taxonomy with REACH at main/src/main/resources/org/clulab/reach/biogrammar/taxonomy.yml
.
Step Four
Two new rules need to be added to main/src/main/resources/org/clulab/reach/biogrammar/entities/entities.yml
.
Step Five
The version.sbt
file should then be updated with a new version number.
Step Six
The build.sbt
file contains a line .aggregate(processors, main, causalAssembly, export)
, which needs to be updated to .aggregate(processors, main, causalAssembly, export, bioresources)
.
Step Seven
The final step is to run sbt compile
and sbt publishLocal
on REACH.
Step One
Within Odinson, in extra/build.sbt
, the line "org.clulab" %% "reach-processors" % "1.6.4-SNAPSHOT"
should be added to the libraryDependencies
block, but with the same version number that was added to version.sbt
in REACH.
Step Two
In extra/src/main/resources/application.conf
, the line processorType = "CluProcessor"
, should be replaced with the line processorType = "BioNLPProcessor"
.
Step Three
In extra/src/main/scala/ai/lum/odinson/extra/utils/ProcessorsUtils.scala
, the line import org.clulab.processors.bionlp.BioNLPProcessor
should be added under the line import org.clulab.processors.fastnlp.FastNLPProcessor
and the block
case "BioNLPProcessor" => {
dynet.Utils.initializeDyNet(autoBatch = false, mem = "1024,1024,1024,1024")
new BioNLPProcessor
}
should be added within getProcessor
.
Step Four
In the file at extra/src/main/scala/ai/lum/odinson/extra/AnnotateText.scala
the line import org.clulab.processors.bionlp.BioNLPProcessor
should be added, and within annotateTextFile
, this block should be added:
processor match {
case p:BioNLPProcessor =>
p.recognizeRuleNamedEntities(doc)
}
Step Five
The final step is to run sbt compile
and sbt publishLocal
on Odinson.
Using INDRA-SPINE also requires access to the Odinson web service, which can be found by running this command:
docker pull lumai/odinson-rest-api
The starting point for using INDRA-SPINE is a corpus of articles or abstracts. It will be stored as a set of text files with the extension .txt
in a folder called text
within a larger folder named for the corpus. Easy access to downloading articles from PubMed using INDRA and INDRA DB is available via the CLI provided. The user simply has to run this command in Terminal, substituting in the desired search term and path to save the files:
python -m indra_spine.cli corpus searchterm path
The next step is to configure Odinson. The odinson.dataDir
variable in the application.conf
file (which is located at extra/src/main/resources
) should be updated to point to the directory containing the corpus.
Next, the user should use Odinson to annotate and index the corpus. This step requires the Intel version of Java to be installed using the link above. These commands should then be run from within the Odinson directory:
Step 1:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home
Step 2, annotate text:
sbt -java-home $JAVA_HOME "extra/runMain ai.lum.odinson.extra.AnnotateText"
Step 3, generate index:
sbt -java-home $JAVA_HOME "extra/runMain ai.lum.odinson.extra.IndexDocuments"
Following this, the docker for the Odinson web service should be run using this command, with the path updated to the directory where the corpus was generated:
docker run -v /path_to_corpus:/app/data/odinson -p 9000:9000 lumai/odinson-rest-api
The interaction network file can then be used to extract relations from the corpus and generate graphs. This can also be accessed via the CLI. The user simply has to run this command in Terminal, substituting in the desired search term and the path to save results:
python -m indra_spine.cli 'interaction network' searchterm path