Welcome to the viet-goodreads project! This project aims to train a model that predicts a book's rating using the provided dataset. The project includes exploratory analysis of the data, feature engineering and selection, model training and evaluation, and deployment.
- Anh Bui
- Kim Duy Nguyen
- Rhosane Santos
- Thu Thao Pham
To clone the project into your local computer, follow the steps below:
- Open your terminal or command prompt.
- Change the current working directory to the location where you want to clone the project.
- Run the following command:
git clone https://github.com/rhowsane/viet-goodreads.git
- The project will be cloned to your local machine.
- Open your terminal or command prompt.
- Change the current working directory to the location where your project is.
- Run the following command:
git checkout -b dev-yourname
- You are now working on your own branch. To switch back to the main branch, run the following command:
git checkout main
- To switch back to your own branch, run the following command:
git checkout dev-yourname
- To push your work to your own branch, run the following command:
git add .
git commit -m "your commit message"
git push origin dev-yourname
- To merge your work to the main branch, run the following command: (MAKE SURE TO ASK FOR PERMISSION FIRST)
git checkout main
git merge dev-yourname
- To delete your own branch, run the following command:
git branch -d dev-yourname
The project repository has the following structure:
viet-goodreads/
├── Dataset/
│ └── books.csv (The data is now cleaned from the original project's file)
│
├── Notebook/
│ └── clean_csv.ipynb (This notebook is to run only once to clean part of the CSV to make the file readable)
│ └── main.ipynb (This is the main notebook, including the data exploration, analysis, machine learning, etc. parts)
│ └── Dockerfile
│ └── inference.py (It was copied into the image when it wass builded)
│ └── test_df.csv (It is necessary in order to run the container from the DockerHub)
|
└── README.md
└── docker_steps.md
└── requirements.txt
└── copy_csv.py (to copy the raw data set CSV to the CSV that we will work on, so that we will always have the original csv file)
- The
Dataset/
directory contains the dataset filebooks_raw.csv
that contains the original dataset. When you execute the copy_csv.py, abooks.csv
will be created as a clone of thebooks_raw.csv
- The
README.md
file provides an overview of the project and instructions for cloning the repository.
Note: Make sure to install the necessary dependencies mentioned in the requirements.txt file before running them.
To install the dependencies, run the following command:
pip install -r requirements.txt
IMPORTANT:
If you decide to rerun the project from start to finish, here are the steps:
- Delete the Dataset/books.csv file
- Run the copy_csv.py file, this will create a new Dataset/books.csv file (copy of the original Dataset/books_raw.csv file)
- Run the clean_csv.ipynb file, this will clean the Dataset/books.csv file (only need to run once, don't run it again)
- Run the main.ipynb file, this will run the whole project from start to finish