CS-Insights is an open-source project that makes information about computer science research available with the click of a button. We use the DBLP Discovery Dataset D3 to show trends in volume, topics, impact, and many more. The frontend enables abstract access to the data through visualizations and filters. The backend enables access to the data through a well-defined REST API and makes it easy to index new data as well. The prediction-endpoint does the heavy lifting for analyzing topics and other semantic features using parents and childrens of docker containers that can run on different servers. The crawler makes sure that the latest data is retrieved from the web, parsed, and stored into the backend. Finally, the uptime tracker periodically queries all endpoints to check their status and response times.
To get a local copy up and running follow these simple example steps. The project relies on a singe docker-compose.yml file that is used to develop and deploy all the services.
You need to have docker
and the docker-compose-plugin
installed.
First, clone the repository and its submodules locally.
git clone --recurse-submodules -j5 https://github.com/gipplab/cs-insights-main.git
If you are developing, it makes sense to start with the submodules (e.g., frontend, backend, ...) at their main branch. To update all submodules to main, run:
git submodule foreach git pull origin main
To start the development environment, run the following command:
docker compose --env-file .env.development -f docker-compose.dev.yml up --build
This will start each service (e.g., backend, frontend) in development mode with hot reload. Whenever you change any files of the sub-repositories (e.g., ./cs-insights-frontend/src/App.tsx) the development server will reload and show the new rendered frontend.
To run the production environment in detached mode, run the following command:
docker compose up -d
HTTPS Certificate Renewal
SSL/TLS certificates are valid for 90 days. To renew then automatically install a cronjob that checks the renewal weekly on sundays at 9:00 UTC which is the time that fewest users are up to see the downtime.
sudo crontab -e
00 9 * * 0 certbot renew --dry-run --pre-hook "docker compose -f /home/jp/cs-insights-main/docker-compose.yml stop frontend" --post-hook "docker compose -f /home/jp/cs-insights-main/docker-compose.yml start frontend"
Database Updates
To update the database with the latest release, run the following commands (replace admin/admin_password with the production secrets):
mongoimport --db csinsights --collection papers --file papers.jsonl --upsertFields=corpusid --mode=upsert --authenticationDatabase=admin --username admin --password admin_password --numInsertionWorkers 12
mongoimport --db csinsights --collection authors --file authors.jsonl --upsertFields=authorid --mode=upsert --authenticationDatabase=admin --username admin --password admin_password --numInsertionWorkers 12
This will update all existing entries and add new entries on the respective id fields.
Production Secrets
Some secret variables should only be available encrypted as environment variables in the production environment. Therefore, the docker-compose.yml contains docker secrets encrypted on the host server. To export the secrets on the server, run the following command:
docker swarm init
Or you can also join a swarm using
docker swarm join --token <token> <manager-ip>:2377
where manager-ip is cs-insights.uni-goettingen.de.
Then create the secrets of the docker-compose-yml with the following command:
printf "<secret>" | docker secret create <secret_name> -
Alternatively to exporting external secrets and referring to them with
mongo_password:
external: true
you can also store them in text files on the host system and give docker-compose the path
mongo_password:
file: mongo_password.txt
Supporting the secrets in files is compatible with docker-compose, while external secrets rely on docker stack deploy
docker stack deploy -c docker-compose.yml cs-insights
The services are available on the following ports:
- frontend:
80/443
or3001
(dev) - backend:
80/443
or3000
(dev) - prediction-endpoint:
8000
For more examples, please refer to the individual API documentation for backend, frontend, prediction-endpoint, and crawler.
For more details on the features of CS-Insights, please refer to our paper here.
See the project board for a full list of planned features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- We will get in touch and include your feature in the next release.
Distributed under the MIT License. See LICENSE
for more information.
Jan Philip Wahle - @jpwahle - wahle@gipplab.org Terry Ruas - @ruasterry - ruas@gipplab.org
Project Link: https://github.com/jpwahle/cs-insights
@misc{Ruas2022,
doi = {10.48550/ARXIV.2210.06878},
url = {https://arxiv.org/abs/2210.06878},
author = {Ruas, Terry and Wahle, Jan Philip and Küll, Lennart and Mohammad, Saif M. and Gipp, Bela},
keywords = {Computation and Language (cs.CL), Digital Libraries (cs.DL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {CS-Insights: A System for Analyzing Computer Science Research},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution Share Alike 4.0 International}
}