This is a Pytorch implementation of the paper: Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion. The experiments and trained models were based on the MSP-Podcast v1.6 corpus & LLDs features in the paper.
- Python 3.6+
- Ubuntu 18.04
- CUDA 10.0+
- pytorch version 1.4.0+
- The scipy, numpy and pandas...etc standard packages
- The MSP-Podcast v1.6 (or any other version) corpus (request to download from UTD-MSP lab website)
- The IS13ComParE LLDs (acoustic features) extracted by OpenSmile (users can refer to the opensmile-LLDs-extraction repository)
After extracting the 130-dim LLDs (via OpenSmile) for the corpus. Then, use the norm_para.py to save normalization parameters for z-norm of the input data and label. We have provided the parameters of v1.6 corpus in the 'NormTerm' folder.
Codes for building the chunk-level emotion rankers are put in the 'chunk_emotion_rankers' folder. The generate_preference_labels.py is used for generating preference labels (based on the QA approach) for training. We have provided the trained rankers in the 'trained_ranker_model_v1.6' folder. And the retrieved emotion ranking sequences for the v1.6 corpus are in the 'EmoChunkRankSeq' folder. If users like to train the rankers and retrieve sequences from scratch, you can follow the steps:
- use generate_preference_labels.py to obtain preference labels
- run training_ranker.py in the terminal to train the model
python training_ranker.py -ep 20 -batch 128 -emo Dom
- run testing_ranker.py in the terminal to test the model (optional)
python testing_ranker.py -ep 20 -batch 128 -emo Dom
- run generate_ranking_seqence.py in the terminal to retrieve ranking sequences
python generate_ranking_seqence.py -ep 20 -batch 128 -emo Dom -set Train
After retrieved the chunk-level ranking sequences, it's straightforward to directly treat them as target sequence to train the Seq2Seq SER model. We use the 'generate_chunk_EmoSeq' function in utils.py to track emo-trends, smooth and re-scale to generate the target emotion curves for training. Simply run the following steps to build the model:
- run training.py in the terminal to train the model
python training.py -ep 30 -batch 128 -emo Val
- run testing.py in the terminal to test the model (optional)
python testing.py -ep 30 -batch 128 -emo Val
We also provide the trained models in the 'trained_seq2seq_model_v1.6'.The CCC performances of models based on the test set are shown in the following table. Note that the results are slightly different from the paper since we performed statistical test in the paper (i.e., we averaged multiple trails results together).
Aro. | Dom. | Val. | |
---|---|---|---|
Seq2Seq-RankerInfo | 0.7103 | 0.6302 | 0.3222 |
Users can get these results by running the testing.py with corresponding args.
If you use this code, please cite the following paper:
Wei-Cheng Lin and Carlos Busso, "Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion"
@article{Lin_2023_2,
author = {W.-C. Lin and C. Busso},
title = {Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume = {31},
number = {},
year = {2023},
pages = {1087-1099},
month = {February},
doi={10.1109/TASLP.2023.3244527},
}