Codebase to accompany Biosignal Authentication Considered Harmful Today:
Veena Krish, Nicola Paoletti, Milad Kazemi, Scott Smolka, Amir Rahmati (2024). Biosignal Authentication Considered Harmful Today. In USENIX Security Symposium (USENIX Sec).
This codebase is divided into three parts:
1. custom_datasets
: loading and processing raw data, used across the codebase
2. model_training
: training the cyclegan network used to generate spoofed biosignals
3. authentication_systems
, an implementation of the systems tested in the paper
Data handlers for loading, processing and splitting data are packaged as a python module that can be imported in other sections of the overall codebase. This can be
First build the custom_datasets package:
cd custom_datasets
python -m build
Then install the package in the main venv as:
pip install --editable .
Notes:
- Need to upgrade pip beyond 21.3 in order to build and install editable
- PyPA User Guide if you encounter any issues
The file datasets.py
contains generators for yielding a data sample for a given configuration, defined in properties.py``. Arguments
datasets.get_data`:
- dataset_name: as defined in properties.py. E.g. 'dalia', 'capno'
- data_type: biosignal modality. E.g. 'ECG', 'PPG'
- spoof: bool for requesting generated/spoofed data or original. False is overriden if spoof_name supplied
- spoof_name: name of the spoof generation method (e.g. 'cardiogan_contrastive', 'video_ecg_contrastive')
- split: oneof: ['train', 'test', 'all'], denotes the split on subjects (note: not on time), as defined in properties.py. Generally used for training/testing the cyclegan's generalization ability
- fraction_time: fraction of total signal time to yield, typically used for training v testing the authentication systems.
from custom_datasets import datasets
generator = dataasets.get_data(dataset_name='dalia', data_type='PPG', spoof=False, spoof_name=None, split='train', fraction_time=(0, 0.5), session=None)
subject_ix, subject_name, session_name, ppg_npy = next(generator)
# For paired data (for evaluating model training):
generator = dataasets.get_data('dalia', ['PPG', 'ECG'] spoof=False, split='train')
_, _, _, paired_data = next(generator)
ppg_npy = paried_data['PPG']
ecg_npy = paried_data['ECG']
Contains pytorch-based model training scripts. View example usage with python train_contrastive_cardiogan.py --help
. DataLoaders pull paried raw data from custom_datasets.get_paired_data and shuffle to train unpaired translation. Each DataLoader specifies the source and target datatype for the given dataset.
Example usage:
# Train the main ppg -> ecg spoofing model
python train_contrastive_cardiogan.py --dataset ecgppg_cardiogan
# Train an example video -> ecg spoofing model
python train_contrastive_cardiogan.py --dataset rppgecg_hcitagging
The following authentication systems are implemented (using public codebases if available). Evaluation scripts are included in each directory for testing the false acceptance rate of spoofed data.
Published as: ECG Biometric Recognition: Review, System Proposal, and Benchmark Evaluation. P Melzi, R Tolosana, R Vera-Rodriguez - IEEE Access, 2023.
Implementation is obtained directly from authors' provided codebase and modified to work with our datasets.
Usage:
# Prepare data for <dataset_name>
python src/prepare_dataset.py --dataset <dataset_name> # note: this creates train/train.json and train/val.json
python src/prepare_dataset.py --dataset <dataset_name> --eval # note: this creates eval/train.json that has nothing and eval/val.json
python src/prepare_dataset.py --dataset bidmc --eval --spoof_name cardiogan_contrastive
# Make config files for autoencoder and siamese network training and testing
python src/make_configs.py --dataset <dataset_name> --autoencoder
python src/make_configs.py --dataset <dataset_name>
python src/make_configs.py --dataset <dataset_name> --eval
python src/make_configs.py --dataset <dataset_name> --spoof_name cardiogan_contrastive --eval
# Train
python src/train.py --config_file configs/autoencoder/<dataset_name>/config_autoencoder.json --rename latest
python src/train.py --config_file configs/siamese/<dataset_name>/config_train.json --rename latest
# Then predict and save the working EER, EER_threshold for a withheld validation section
python src/predict.py --dataset <dataset_name> --config_file configs/siamese/<dataset_name>/config_train.json --model_name latest --save_stats
# Eval using the predicted EER/thresh on a final withheld section, over 10 attempts
python src/eval.py --dataset <dataset_name> --model_name latest --spoof_name original
python src/eval.py --dataset <dataset_name> --model_name latest --spoof_name cardiogan_contrastive
Usage:
# Train
python train.py --dataset <dataset_name> --save latest
# Test and get the EER ("--save_stats" will save the eer threshold to the model file)
python test.py --dataset <dataset_name> --model_path saved_models/<dataset_name>/latest.pt --save_stats
# Eval using the saved EER over 10 attempts on a separate section of data, on original and spoofed data
python eval.py --dataset <dataset_name> --model_path saved_models/<dataset_name>/latest.pt --spoof_name original
python eval.py --dataset <dataset_name> --model_path saved_models/<dataset_name>/latest.pt --spoof_name cardiogan_contrastive
Published as: [Ibtehaz, Nabil, et al. [EDITH: ECG biometrics aided by deep learning for reliable individual authentication. IEEE Transactions on Emerging Topics in Computational Intelligence, 2021](Ibtehaz, Nabil, et al. "EDITH: ECG biometrics aided by deep learning for reliable individual authentication." IEEE Transactions on Emerging Topics in Computational Intelligence 6.4 (2021): 928-940. )
Usage:
# Train feature extractor and siamese models (n.b. train_siamese also gets EER on a withheld set and saves to model)
python train_baseclassifier.py --dataset <dataset_name> --save base
python train_siamese.py --dataset <dataset_name> --saved_model saved_models/<dataset_name>/base.pt --save siamese
# Eval on original/spoofed datasets
python eval.py --dataset <dataset_name> --saved_base_model saved_models/<dataset_name>/base.pt --saved_siamese_model saved_models/<dataset_name>/siamese.pt --spoof_name cardiogan_contrastive
Usage:
python generate_data.py --dataset <dataset_name> --spoof_name original --split train
python generate_data.py --dataset <dataset_name> --spoof_name original --split test
python generate_data.py --dataset <dataset_name> --spoof_name cardiogan_contrastive --split test
2. Train, get test EER and save
python train.py --dataset <dataset_name> --save
3. Eval on 10 trials
python eval.py --dataset <dataset_name> --model_dir saved_models/<dataset_name> --spoof_name original
python eval.py --dataset <dataset_name> --model_dir saved_models/<dataset_name> --spoof_name cardiogan_contrastive
Usage:
# Train and save test-split EER:
`python train.py --dataset <dataset_name> --save latest
# Eval on orig and spoofed datasets
python eval.py --dataset dalia --models_dir saved_models/<dataset_name>/latest/
python eval.py --dataset dalia --models_dir saved_models/<dataset_name>/latest/ --spoof_name cardiogan_contrastive
Generally based off unofficial implementation (written by an author of the paper but not linked within)
Usage:
# Train and get test-split EER:
python train.py --dataset <dataset_name> --lstm --save latest
# Eval on original and spoofed sets :
python eval.py --dataset <dataset_name> --models_dir saved_models/<dataset_name>/latest/
python eval.py --dataset <dataset_name> --spoof_name cardiogan_contrastive --models_dir saved_models/<dataset_name>/latest
Usage:
python train_and_eval.py --wavelet morse --spoof_name original`
python train_and_eval.py --wavelet morse --spoof_name cardiogan_contrastive`
Usage:
python train.py --save latest
python eval.py --model_path saved_models/latest --spoof_name cardiogan_contrastive
python train.py --dataset <dataset_name> --save latest
python eval.py --dataset dataset --models_dir saved_models/<dataset_name>/latest
python eval.py --dataset dataset --models_dir saved_models/<dataset_name>/latest --spoof_name cardiogan_contrastive
python train.py --dataset <dataset_name> --save latest
python eval.py --dataset dataset --models_dir saved_models/<dataset_name>/latest
python eval.py --dataset dataset --models_dir saved_models/<dataset_name>/latest --spoof_name cardiogan_contrastive