Skip to content

GROOViST: A Metric for Grounding Objects in Visual Storytelling – EMNLP 2023

Notifications You must be signed in to change notification settings

akskuchi/groovist

Repository files navigation

CC BY license Python PyTorch

👀 What?

This repository contains code for using GROOViST: A Metric for Grounding Objects in Visual Storytelling—In proceedings of EMNLP 2023.

🤔 Why?

Evaluating the degree to which textual stories are grounded in the corresponding image sequences is essential for the Visual Storytelling task. We propose GROOViST, based on insights obtained from existing open-source metrics (CLIPScore, RoViST-VG). Our analyses shows that GROOViST effectively measures the extent to which a story is grounded in an image sequence.

🤖 How?

Currently, GROOViST can be used off-the-shelf for evaluating <image-sequence, story> pairs of three Visual Storytelling datasets — VIST, AESOP, VWP. For a new/custom dataset, all the following steps can be adapted accordingly.

Setup

Install python (e.g., 3.11) and other dependencies provided in requirements.txt. E.g., using:

pip install -r requirements.txt

Step 0: Extract image regions

For the sequence(s) of interest, GROOViST requires B image regions per image in the sequence(s) (e.g., B=10). Please refer to this doc for preparing them.

Step 1: Extract noun phrases

For the sequence(s) of interest, GROOViST works with the noun phrases in the stories. Use the following command for extracting noun phrases from stories:

python extract_nphrases.py --input_file data/sample_stories.json --output_file data/sample_nphrases.json

Step 2: Compute GROOViST scores

python groovist.py --dataset VIST --input_file data/sample_nphrases.json --output_file data/sample_scores.json


🔗 If you find this work useful, please consider citing it:

@inproceedings{surikuchi-etal-2023-groovist,
    title = "{GROOV}i{ST}: A Metric for Grounding Objects in Visual Storytelling",
    author = "Surikuchi, Aditya  and Pezzelle, Sandro  and Fern{\'a}ndez, Raquel",
    editor = "Bouamor, Houda  and Pino, Juan  and Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.202",
    pages = "3331--3339"
}

About

GROOViST: A Metric for Grounding Objects in Visual Storytelling – EMNLP 2023

Topics

Resources

Stars

Watchers

Forks

Languages