MDMP: Multi-modal Diffusion for supervised Motion Predictions

[Paper]

If you find our code or paper helpful, please consider starring our repository and citing:

@misc{bringer2024mdmp,
      title={MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty}, 
      author={Leo Bringer and Joey Wilson and Kira Barton and Maani Ghaffari},
      year={2024},
      eprint={2410.03860},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.03860}, 
}

📮 News

📢 2024-10-04 --- MDMP is on ArXiv.

📢 2023-10-01 --- Now you can use MDMP in Blender as an add-on.

📢 2024-09-25 --- Release codes and models for MDMP. Including training/eval/generation scripts.

🛠️ Getting Started

1. Setup Conda environment

sudo apt update
sudo apt install ffmpeg
conda env create -f environment.yml
conda activate mdmp
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git

We test our code on Python 3.7.13 and PyTorch 1.7.1

Alternative: Pip Installation

We provide an alternative pip installation in case you encounter difficulties setting up the conda environment.

pip install -r requirements.txt

We test this installation on Python 3.10

2. Download dependencies:

pip install --upgrade --no-cache-dir gdown
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
bash prepare/download_recognition_models.sh

3. Get data

HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D

4. Download the pretrained models

bash prepare/download_models.sh

(Optional) Download Manually

mdmp_pretrained - it's a zip file, unzip and place them in ./save/.

👁️ Visuals

Demo with Skeletons

Generate from test set prompts

python -m sample.generate_w_gt --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 3 --num_repetitions 3

You may modify the arguments based on your preferences:

--device id.
--num_samples to generate more samples conditionned on different inputs
--num_repetitions to generate more samples conditionned on the same inputs
--model_path to change the path if you have trained your own model and want to test it

Running those will get you:

results.npy file with text prompts and xyz positions of the generated animation
sample##_rep##.mp4 - a stick figure animation for each generated motion including the ground-truth motion for comparison.

It should look something like this:

Demo with Skeletons & Presence zones

python -m sample.generate_w_zones --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 3 --num_repetitions 3

You may modify the arguments based on your preferences:

--device id.
--num_samples to generate more samples conditionned on different inputs
--num_repetitions to generate more samples conditionned on the same inputs
--model_path to change the path if you have trained your own model and want to test it

Running those will get you:

results.npy file with text prompts and xyz positions of the generated animation
sample##_rep##.mp4 - a stick figure animation for each generated motion including zones of presence around 'end-effector' joints to assess uncertainty.

It should look something like this:

Demo with SMPL Meshes (with Blender)

Generate simple skeleton videos to be rendered

python -m sample.generate_for_meshes --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 3 --num_repetitions 3

You may also define:

--device id.
--num_samples to generate more samples conditionned on different inputs
--num_repetitions to generate more samples conditionned on the same inputs
--model_path to change the path if you have trained your own model and want to test it

Running those will get you:

results.npy file with text prompts and xyz positions of the generated animation
sample##_rep##.mp4 - a stick figure animation for each generated motion with no ground-truth, no floor, nor zones of presence.

It should look something like this:

Create SMPL parameters

Frow now on if you want to render the SMPL mesh you should chose an .mp4 file that you would like to render, copy its relative path and use the following script to create SMPL parameters of that file:

python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file

This script outputs:

sample##_rep##_smpl_params.npy - SMPL parameters (thetas, root translations, vertices and faces)
sample##_rep##_obj - Mesh per frame in .obj format.

Set up blender

Refer to TEMOS-Rendering motions for blender setup, then install the following dependencies.

YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/blender_requirements.txt

Render SMPL meshes

Run the following command to render SMPL using blender:

YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --npy=YOUR_NPY_FOLDER --mode=video

You may also define:

--mode=video: render mp4 video
--mode=sequence: render the whole motion in a png image.

Based on the mode you chose this script outputs:

sample##_rep##_smpl_params.mp4 - a Video with the SMPL parameters rendered with Blender that should look like this:

sample##_rep##_smpl_params.png - A summary of the sequence through an image that should look like this:

It will also create one folder with the .obj files and one with each single frame of the video.

🚀 Train your own MDMP

python -m train.train_mdmp --save_dir save/my_own_mdmp --dataset humanml

Use --diffusion_steps 50 to train a faster model with less diffusion steps.
Use --device to define GPU id.
Add --train_platform_type {ClearmlPlatform, TensorboardPlatform} to track results with either ClearML or Tensorboard.
Add --use_gcn true to try the GCN version
Change --emb_motion_len to a value lower than 50 if you want you model to be conditionned on shorter motion sequences
Add --num_steps to specificy the number of training steps and train more or less
Use --batch_size to change to a smaller batch size if your GPU memory gets in the way

📊 Evaluate

Accuracy Study (MPJPE)

Takes about 30mins (on a single GPU) for 3 repetitions per input to go over the entire test set (excluding motion sequences shorter than 3s).
The output of this script will be printed in the terminal and correspond to the MPJPE at various time steps of the predicted motion (from 0.5 to 5.5s).
The pre-trained model results should match the ones reported in the temporal chart of the paper (or sometimes lower).

python -m eval.eval_mpjpe --model_path ./save/mdmp_pretrained/model000500000.pt --num_repetitions 3

You may also define:

--device id.
--num_samples to generate more samples conditionned on different inputs
--num_repetitions to generate more samples conditionned on the same inputs
--model_path to change the path if you have trained your own model and want to test it

The chart in the paper: :

Uncertainty Study (Sparsification Error)

The output of this script will be saved in the folder and correspond to the Sparsification Plot.
The pre-trained model results should approxiametly match the ones reported in the paper (or sometimes lower).

python -m eval.eval_spars --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 10 --num_repetitions 5

You may also define:

--device id.
--num_samples to generate more samples conditionned on different inputs
--num_repetitions to generate more samples conditionned on the same inputs (usually results in a curve that aligns even closely to the Oracle)
--model_path to change the path if you have trained your own model and want to test it

The plot in the paper: (here we only assess 'Mode Divergence' which is the best index) :

Acknowledgments

Thanks to guided-diffusion, MDM, MoMask, TEMOS, ACTOR, HumanML3D, text-to-motion, and joints2smpl, our code is partially borrowing from them.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDMP: Multi-modal Diffusion for supervised Motion Predictions

[Paper]

📮 News

🛠️ Getting Started

1. Setup Conda environment

Alternative: Pip Installation

2. Download dependencies:

3. Get data

4. Download the pretrained models

(Optional) Download Manually

👁️ Visuals

Generate from test set prompts

Generate simple skeleton videos to be rendered

Create SMPL parameters

Set up blender

Render SMPL meshes

🚀 Train your own MDMP

📊 Evaluate

Acknowledgments

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
assets		assets
body_models		body_models
configs		configs
data_loaders		data_loaders
dataset		dataset
diffusion		diffusion
eval		eval
model		model
prepare		prepare
render		render
sample		sample
train		train
utils		utils
visualize		visualize
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
environment.yml		environment.yml
render.py		render.py

License

leob03/mdmp

Folders and files

Latest commit

History

Repository files navigation

MDMP: Multi-modal Diffusion for supervised Motion Predictions

[Paper]

📮 News

🛠️ Getting Started

1. Setup Conda environment

Alternative: Pip Installation

2. Download dependencies:

3. Get data

4. Download the pretrained models

(Optional) Download Manually

👁️ Visuals

Generate from test set prompts

Generate simple skeleton videos to be rendered

Create SMPL parameters

Set up blender

Render SMPL meshes

🚀 Train your own MDMP

📊 Evaluate

Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages