The graph database Neo4j does support fulltext indexing out of the box. Part of this is fine grained configuration of analzyers for each index. There’s a good write up how to build your own custom analyzers. This project leverages couple of commonly used analyzers.
The analyzers are supposed to work with Neo4j >= 3.4. Clone this repo and use Maven to build a jar file:
mvn package
cp target/neo4j-additional-analyzers-<version>.jar <NEO4J_HOME>/plugins
and restart Neo4j.
Set up a fulltext index and specificy the custom index:
CALL db.index.fulltext.createNodeIndex("myindex", ["Articles"], ["abstract", "content"], {analyzer: "whitespace_lower"});
CALL db.index.fulltext.createNodeIndex("myindex", ["Articles"], ["abstract", "content"], {analyzer: "synonym"});
This analyzer combines the whitespace
analyzer with German and English stopword lists and applies a toLower conversion as well.
Due to usage of whitespace
terms containing dashes, e.g. PNAS-108
as gene symbol or technical terms like x-270°C
are treated as one atomic term.