This python script helps to transform the data set from the Lens.org Covid19 Patent Dataset into a neo4j graph.
This script usually runs in context of the Covid*Graph Pipeline but can also run standalone.
Maintainer: Tim
Version: 1.0.0
Python version: Python3
Run
docker run -it --rm --name data-lens-org-covid19-patents -e CONFIGS_NEO4J='{"host":"localhost"}' covidgraph/data-lens-org-covid19-patents
NOTE: For details on the
-e CONFIGS_NEO4J
env variable see https://github.com/covidgraph/motherlode/blob/master/README.md#the-neo4j-connection-string
Build local image
From the root directorie of this repo run:
docker build -t data-lens-org-covid19-patents .
Run local image
Examples (neo4j runs on the docker linux host machine)
docker run -it --rm --network host --name data-lens-org-covid19-patents -v ${PWD}/dataset:/app/dataset -e CONFIGS_NEO4J='{"host":"localhost"}' data-lens-org-covid19-patents
Envs
The most important Env variables are:
ENV
: will be PROD
or DEV
CONFIGS_NEO4J
: defaults to {"host":"localhost"}
. The connections details for the database. For details see https://github.com/covidgraph/motherlode/blob/master/README.md#the-neo4j-connection-string
besides that you can set all variables in dataloader/config.py
via env variable with a CONFIGS_
prefix. See https://git.connect.dzd-ev.de/dzdtools/pythonmodules/-/tree/master/Configs for more details on how to manipulate the parameters
Volumes
/app/dataset
Here is the downloaded data set located. You can mount this path with -v /mylocal/path:/app/dataset
to prevent redownloading of the dataset.
/app/dataloader
Here is the python source code located. You can mount this for development or tinkering
Copy dataloader/env/DEFAULT.env
to dataloader/env/DEVELOPMENT.env
:
cp dataloader/env/DEFAULT.env dataloader/env/DEVELOPMENT.env
Enter your neo4j connection string at dataloader/env/DEVELOPMENT.env
into the variable CONFIGS_NEO4J
:
CONFIGS_NEO4J={"host":"localhost"}
Install the requirements with
pip3 install -r requirement.txt
run the main.py
python3 main.py