DevLake offers an abundance of data for exploration. This playground contains a basic set-up to interact with the data using Jupyter Notebooks and Pandas.
- Python >= 3.12
- Poetry
- Access to a DevLake database
- Optional: An IDE or plugin that supports running Jupyter Notebooks directly (e.g. Visual Studio Code)
- Have a local clone of this repository.
- Run
poetry install
in the root directory. - Either:
- navigate to the
notebooks
directory and run the jupyter serverpoetry run jupyter notebook
- navigate to one of the notebook files (
.ipynb
) in thenotebooks
directory from your IDE directly
- navigate to the
- Make sure the notebook uses the virtual environment created by poetry.
- Configure your database URL in the notebook code.
- Run the notebook.
- Start exploring the data in your own notebooks!
A good starting point for creating a new notebook is template.ipynb
.
It contains the basic steps you need to go from query to output.
To define a query, use the Domain Layer Schema to get an overview of the available tables and fields.
Use Pandas api to organize, transform, and analyze the query results.
A notebook might offer a valuable perspective on the data not available within the capabilities of a Grafana dashboard.
In this case, it's worthwhile to contribute this notebook to the community as a predefined notebook, e.g., process_analysis.ipynb
(it depends on graphviz for its visualization).
The same goes for utility methods with, for example, predefined Pandas data transformations offering an interesting view on the data.
Please check the contributing guidelines.