Code accompanying this project.
In order to use the system, you will need to install the following dependencies:
- Pytorch
- Numpy
To install, build the source code from the repository by running the following command in your terminal:
git clone https://github.com/JGuymont/vae-anomaly-detector.git
cd vae-anomaly-detector
python -m venv venv
venv\Scripts\activate
python -m pip install --upgrade pip
Install Pytorch (see website for GPU supported installation):
pip install torch==1.2.0+cpu torchvision==0.4.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
Install other requirements:
pip install -r requirements.txt
You can download the dataset on Kaggle. The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam. The file spam.csv
contain one message per line. Each line is composed by two columns: v1 contains the label (ham or spam) and v2 contains the raw text.
To reproduce the experiment without changing the configuration, you should save the file spam.csv
in the data/
directory.
Start by splitting the data into a training set and a test set.
python split_data.py --train_size 0.5
Running this command in the terminal will create train.csv
and a test.csv
and save them in the directory data/
. Note that you only need to run this command omce.
Train the model by running the flollowing command
python main.py --model boc