This is the solution for the Forest Cover Type Prediction competition formated as a Python package. This package allows you to train model for predicting the forest cover type (the predominant kind of tree cover) from strictly cartographic variables and uses Forest train dataset. The main metric is categorization accuracy.
The purpose of this project is not only to solve the competition, but also to master modern tools that are useful for writing quality code in Python, as well as developing and deploying models. Used tools:
- Poetry
- MLflow
- Click
- pytest
- flake8
- mypy
- black
- nox
- Github Actions
- FastAPI
- Docker
- Clone this repository to your machine:
git clone https://github.com/MeSugar/forest_competition.git
cd forest_competition
- Download Forest train dataset, save csv locally (default path is data/train.csv in repository's root).
- Make sure Python 3.8 and Poetry are installed on your machine (I used Poetry 1.1.13).
- Install the project dependencies (run this and following commands in a terminal, from the root of a cloned repository):
poetry install --no-dev
- Run train with the following command:
poetry run train -d <path to csv with data> -s <path to save trained model>
You can configure additional options (e.g., the algorithm to be chosen for the task) in the CLI. To get a full list of them, use help:
poetry run train --help
- To see the information about conducted experiments (algorithm, metrics, hyperparameters) run MLflow UI:
poetry run mlflow ui
- You can produce EDA report in .html format using Pandas-Profiling:
poetry run eda -d <path to csv with data> -s <path to save report>
To see the list of configure options in the CLI run with --help option
- To make submission file with predictions run:
poetry run predict
To see the list of configure options in the CLI run:
poetry run predict --help
The code in this repository must be tested, formatted with black, and pass mypy typechecking before being commited to the repository.
Install all requirements (including dev requirements) to poetry environment:
poetry install
Now you can use developer instruments, e.g. pytest:
poetry run pytest
Lint source code with flake8:
poetry run flake8 src tests noxfile.py
Format your code with black:
poetry run black src tests noxfile.py
Perform type cheking with mypy:
poetry run mypy src tests noxfile.py
More conveniently, to run all sessions of testing, formatting and type checking in a single command, install and use nox:
pip install --user --upgrade nox
nox [-r]
In case you want to run a specific step:
nox -[r]s flake8
nox -[r]s black
nox -[r]s mypy
nox -[r]s tests
It is possible to deploy created model using FastAPI.
- You must have Docker installed.
- Build an image with a Dockerfile and run a container with it:
docker build -t app .
docker run -d -p 8000:8000 --name model-deploy app
-
Go to http://localhost:8000/docs and click the "Try it out" button in the /predict block.
-
Since the model requires quite a lot of values, insert parameters in the Request body from the example.json file from the repository root. Then click on the "Execute" button.
-
Finally, predicted by the model value will appear in the Response body block:
- Article series on model evaluation, model selection, and algorithm selection
- Scikit-learn guide to cross-validation
- Nested cross-validation
- I don't like notebooks.- Joel Grus
- The Complete Guide to Python Virtual Environments!
- The Hitchhiker's Guide to Python
- Article series Hypermodern Python