Skip to content

Solution for the Forest Cover Type Prediction competition using modern tools for python

License

Notifications You must be signed in to change notification settings

MeSugar/forest_competition

Repository files navigation

Tests

This is the solution for the Forest Cover Type Prediction competition formated as a Python package. This package allows you to train model for predicting the forest cover type (the predominant kind of tree cover) from strictly cartographic variables and uses Forest train dataset. The main metric is categorization accuracy. image

Goals

The purpose of this project is not only to solve the competition, but also to master modern tools that are useful for writing quality code in Python, as well as developing and deploying models. Used tools:

  • Poetry
  • MLflow
  • Click
  • pytest
  • flake8
  • mypy
  • black
  • nox
  • Github Actions
  • FastAPI
  • Docker

Usage

  1. Clone this repository to your machine:
git clone https://github.com/MeSugar/forest_competition.git
cd forest_competition
  1. Download Forest train dataset, save csv locally (default path is data/train.csv in repository's root).
  2. Make sure Python 3.8 and Poetry are installed on your machine (I used Poetry 1.1.13).
  3. Install the project dependencies (run this and following commands in a terminal, from the root of a cloned repository):
poetry install --no-dev
  1. Run train with the following command:
poetry run train -d <path to csv with data> -s <path to save trained model>

You can configure additional options (e.g., the algorithm to be chosen for the task) in the CLI. To get a full list of them, use help:

poetry run train --help
  1. To see the information about conducted experiments (algorithm, metrics, hyperparameters) run MLflow UI:
poetry run mlflow ui

image

  1. You can produce EDA report in .html format using Pandas-Profiling:
poetry run eda -d <path to csv with data> -s <path to save report>

To see the list of configure options in the CLI run with --help option

  1. To make submission file with predictions run:
poetry run predict

To see the list of configure options in the CLI run:

poetry run predict --help

Development

The code in this repository must be tested, formatted with black, and pass mypy typechecking before being commited to the repository.

Install all requirements (including dev requirements) to poetry environment:

poetry install

Now you can use developer instruments, e.g. pytest:

poetry run pytest

image

Lint source code with flake8:

poetry run flake8 src tests noxfile.py

Format your code with black:

poetry run black src tests noxfile.py

image

Perform type cheking with mypy:

poetry run mypy src tests noxfile.py

image

More conveniently, to run all sessions of testing, formatting and type checking in a single command, install and use nox:

pip install --user --upgrade nox
nox [-r]

image

In case you want to run a specific step:

nox -[r]s flake8
nox -[r]s black
nox -[r]s mypy
nox -[r]s tests

Model deployment

It is possible to deploy created model using FastAPI.

  1. You must have Docker installed.
  2. Build an image with a Dockerfile and run a container with it:
docker build -t app .
docker run -d -p 8000:8000 --name model-deploy app
  1. Go to http://localhost:8000/docs and click the "Try it out" button in the /predict block. image

  2. Since the model requires quite a lot of values, insert parameters in the Request body from the example.json file from the repository root. Then click on the "Execute" button.

  3. Finally, predicted by the model value will appear in the Response body block:

image

References

Model evaluation and selection

Tracking experiments

Project organization

Code style, reproducibility, testing

Model deployment

About

Solution for the Forest Cover Type Prediction competition using modern tools for python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages