Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
-
Updated
Feb 18, 2023
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
Dataset and Evaluation Scripts for Obstacle Detection via Semantic Segmentation in a Marine Environment
This study introduces MultiBanFakeDetect, a novel multimodal dataset for Bangla fake news detection, combining textual and visual information. It features TextFakeNet for text analysis and MultiFusionFake for integrating multimodal data.
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).
Add a description, image, and links to the multimodal-dataset topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-dataset topic, visit your repo's landing page and select "manage topics."