Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation
apt-get update
apt-get install build-essential -y
Preparing enviroment:
(for torch 1.12.0)
Option 1:
pip install pyg-lib torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cu113.html
export LD_LIBRARY_PATH="/opt/conda/lib/:$LD_LIBRARY_PATH"
Option 2:
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric
pip install -r requirements.txt
For the case of "command 'x86_64-linux-gnu-gcc' failed with exit status 1":
apt-get install python3.x-dev
python run_bias_crs.py --config config/crs/tgredial/tgredial.yaml
The experiments were conducted over ReDial, KGSF, KBRD and TGReDial models and evaluated on ReDIAL and TGReDIAL datasets.
The generation and preparation of the synthetic dialogues is implmented by first [data_prep_gen_.ipynb] and then [gen_convert_.ipynb] within the folder of data_aug (* refers to the name of datasets).
The data augmentation is implemented within the base.py within [bias_crs/data/dataloader/base.py], while the changes to the number of items to be augmented via popNudge can be changed from here.
For every run of the experimental results will be saved under the directory of [data/bias/] and followed by the folders named after model and dataset names and entitled [bias_anlytic_data.csv].
The corresponding analysis of the recommendation results via Cross-Episode Popularity and User Intent-Oriented Popularity scores can be accessed via the folder of [analysis].
from scipy.stats.stats import pearsonr
def compute_pop_scores(pop_score_dict, items):
return [pop_score_dict[item] if item in pop_score_dict else 0.0 for item in items]
pop_scores = [compute_pop_scores(pop_score_dict, row['Prediction_items']) for _, row in data.iterrows()]
data['pop_scores'] = pop_scores
new_conv = True
cep_scores = []
for idx, row in data.iterrows():
# set the default value to the first episode
if new_conv:
new_conv = False
cep_scores.append(0.5)
else:
if idx+1 < len(data) and row['conv_id'] != data.at[idx+1, 'conv_id']:
new_conv=True
pearsonr_score = np.abs(pearsonr(row['pop_scores'], data.at[idx-1, 'pop_scores'])[0])
cep_scores.append(pearsonr_score)
data['cep_score'] = cep_scores
data['cep_pop_score'] = data['cep_score'] * data['pop_bias']
data['target_pop_score'] = data['target_item_index'].map(pop_score_dict)
data['UIOP'] = np.abs(data['pop_bias'] - data['target_pop_score'])
@inproceedings{
title={Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation},
author={Xi Wang, Hossein A. Rahmani, Jiqun Liu, Emine Yilmaz}
booktitle={Proceedings of EMNLP 2023 (Findings)}
year={2023}
}
This repository is developed based on the CRSLab framework [https://github.com/RUCAIBox/CRSLab]. Thanks to their invaluable contributions for enabling a systematic development and evaluation of models within this project.