Code release for "AnimalBench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding"
Previous benchmarks (left) relied on limited agent and the scenarios of editing-based benchmarks are unrealistic. Our proposed Animal-Bench (right) includes diverse animal agents, various realistic scenarios, and encompasses 13 different tasks.
Task Demonstration
Effectiveness evaluation results:
Robustness evaluation results:
We will release our data and code SOON!