Diff-MST: Differentiable Mixing Style Transfer

Repository Structure

configs - Contains configuration files for training and inference.
mst - Contains the main codebase for the project.
- dataloaders - Contains dataloaders for the project.
- modules - Contains the modules for different components of the system.
- mixing - Contains the mixing modules for creating mixes.
- loss - Contains the loss functions for the project.
- panns - contains the most basic components like cnn14, resnet, etc.
- utils - Contains utility functions for the project.
scripts - Contains scripts for running inference.

Setup

Clone the repository

git clone https://github.com/sai-soum/Diff-MST.git
cd Diff-MST

Create new Python environment

# for Linux/macOS
python3 -m venv env
source env/bin/activate

Install the mst package from source

# Install as editable (for development)
pip install -e .

# Alternatively, do a regular install (read-only)
pip install .

Usage

Train

We use LightningCLI for training and Wandb for logging.

Setup

In the configs directory, you will find the configuration files for the project.

config.yaml - Contains the general configuration for the project.
optimizer.yaml - Contains the optimizer configuration for the project.
data/ - Contains the data configuration for the project.
models/ - Contains the model configuration for the project. We have provided instructions within the configuration files for setting up the project.

Few important configuration parameters:

In configs/data/ change the following
- track_root_dirs - The root directory for the dataset needs to be setup. You can pass multiple dataset directories as a list. However, you will also need to provide corresponding metadata YAML files containing train, test, and val split. Check data/ directory for examples.
- For method 1: set generate_mix to True in the model configuration file. Use medley+cambridge-8.yaml for training with random mixes of the same song as reference.
- For method 2: set generate_mix to False in the model configuration file. Use medley+cambridge+jamendo-8.yaml for training with real unpaired songs as reference.
- update mix_root_dirs - The root directory for the mix dataset. This is used for training with real unpaired songs as reference.
You may benefit from setting a smaller value for train_buffer_size_gb and val_buffer_size_gb in the data configuration file for initial testing of the code.
In configs/models/
- you can change the audio effects you want to disable by setting a very large value for the corresponding parameter. For example, to disable the compressor, set active_compressor_epoch to 1000.
- You can change the loss function used for training by setting the loss parameter.
In optimizer.yaml you can change the learning rate parameters.
In config.yaml
- Update the directory for logging using save_dir under trainer.
- You can use ckpt_path to load a pre-trained model for fine-tuning, resuming training, or testing.

Method 1: Training with random mixes of the same song as reference using MRSTFT loss.

CUDA_VISIBLE_DEVICES=0 python main.py fit \
-c configs/config.yaml \
-c configs/optimizer.yaml \
-c configs/data/medley+cambridge-8.yaml \
-c configs/models/naive.yaml

To run the fine-tuning using AFLoss

CUDA_VISIBLE_DEVICES=0 python main.py fit \
-c configs/config.yaml \
-c configs/optimizer.yaml \
-c configs/data/medley+cambridge-8.yaml \
-c configs/models/naive+feat.yaml

You can change the number of tracks, the size of training data for an epoch, and the batch size in the data configuration file located at configs/data/

Method 2: Training with real unpaired songs as reference using AFloss.

CUDA_VISIBLE_DEVICES=0 python main.py fit \
-c configs/config.yaml \
-c configs/optimizer.yaml \
-c configs/data/medley+cambridge+jamendo-8.yaml \
-c configs/models/unpaired+feat.yaml

Inference

To evaluate the model on real world data, run the scripts/eval_all_combo.py script.

Update the model checkpoints and the inference examples directory in the script.

Python 3.10 was used for training.

Acknowledgements

This work is funded and supported by UK Research and Innovation [grant number EP/S022694/1] and Steinberg Media Technologies GmbH under the AI and Music Centre for Doctoral Training (AIM-CDT) at the Centre for Digital Music, Queen Mary University of London, London, UK.

Citation

If you find this work useful, please consider citing our paper:

@inproceedings{vanka2024diffmst,
  title={Diff-MST: Differentiable Mixing Style Transfer},
  author={Vanka, Soumya and Steinmetz, Christian and Rolland, Jean-Baptiste and Reiss, Joshua and Fazekas, Gy{\"o}rgy},
  booktitle={Proc. of the 25th Int. Society for Music Information Retrieval Conf. (ISMIR)},
  year={2024},
  organization={Int. Society for Music Information Retrieval (ISMIR)},
  abbr = {ISMIR},
  address = {San Francisco, USA},
}

License

The code is licensed under the terms of the CC-BY-NC-SA 4.0 license. For a human-readable summary of the license, see https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en .

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
Assets		Assets
configs		configs
data		data
mst		mst
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
stability.sh		stability.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diff-MST: Differentiable Mixing Style Transfer

Repository Structure

Setup

Usage

Train

Setup

Method 1: Training with random mixes of the same song as reference using MRSTFT loss.

Method 2: Training with real unpaired songs as reference using AFloss.

Inference

Acknowledgements

Citation

License

About

Releases

Packages

Contributors 3

Languages

License

sai-soum/Diff-MST

Folders and files

Latest commit

History

Repository files navigation

Diff-MST: Differentiable Mixing Style Transfer

Repository Structure

Setup

Usage

Train

Setup

Method 1: Training with random mixes of the same song as reference using MRSTFT loss.

Method 2: Training with real unpaired songs as reference using AFloss.

Inference

Acknowledgements

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages