:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Dengjia, Martin, Alexander, Jurayj, William, Murray, Kenton, Van Durme, Benjamin, Kriz, Reno
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2604.08701
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HLTCOE Evaluation Team at TREC 2025: VQA Track
by: Zhang, Dengjia, et al.
Published: (2025)

Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)

RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval
by: Skow, Tyler, et al.
Published: (2026)

MARQUIS: A Three-Stage Pipeline for Video Retrieval-Augmented Generation
by: Chakraborty, Debashish, et al.
Published: (2026)

Grounding Partially-Defined Events in Multimodal Data
by: Sanders, Kate, et al.
Published: (2024)

Multi-Vector Index Compression in Any Modality
by: Qin, Hanxiang, et al.
Published: (2026)

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
by: Reddy, Arun, et al.
Published: (2025)

WikiVideo: Article Generation from Multiple Videos
by: Martin, Alexander, et al.
Published: (2025)

MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
by: Kriz, Reno, et al.
Published: (2024)

Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
by: Hamed, Omar, et al.
Published: (2024)

MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
by: Samuel, Saron, et al.
Published: (2025)

Unified Multimodal Discrete Diffusion
by: Swerdlow, Alexander, et al.
Published: (2025)

A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry
by: Zhu, Jingsen, et al.
Published: (2026)

Semantic Residual for Multimodal Unified Discrete Representation
by: Huang, Hai, et al.
Published: (2024)

LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data
by: Bezirganyan, Grigor, et al.
Published: (2024)

Unified Control for Inference-Time Guidance of Denoising Diffusion Models
by: Goyal, Maurya, et al.
Published: (2025)

Toward Unified Multimodal Representation Learning for Autonomous Driving
by: Tao, Ximeng, et al.
Published: (2026)

Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference
by: Ahmed, Sk Miraj, et al.
Published: (2026)

STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
by: Shen, Ying, et al.
Published: (2026)

Multimodal Guidance Network for Missing-Modality Inference in Content Moderation
by: Zhao, Zhuokai, et al.
Published: (2023)

Learning Multimodal Latent Space with EBM Prior and MCMC Inference
by: Yuan, Shiyu, et al.
Published: (2024)

Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models
by: Yu, Yongsheng, et al.
Published: (2024)

Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents
by: Veerabadran, Vijay, et al.
Published: (2025)

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
by: Li, Zhuowan, et al.
Published: (2022)

The Principle of Uncertain Maximum Entropy
by: Bogert, Kenneth, et al.
Published: (2023)

All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding
by: Rahman, Tanzila, et al.
Published: (2026)

NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation
by: Joshi, Devvrat, et al.
Published: (2025)

Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
by: Moradinasab, Nazanin, et al.
Published: (2025)

Process Supervision of Confidence Margin for Calibrated LLM Reasoning
by: Wang, Liaoyaqi, et al.
Published: (2026)

Metric Unreliability in Multimodal Machine Unlearning: A Systematic Analysis and Principled Unified Score
by: Khan, Abdullah Ahmad, et al.
Published: (2026)

EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving
by: Palladino, Vittorio, et al.
Published: (2026)

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
by: Li, Yanghao, et al.
Published: (2025)

SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
by: Li, Bingxin
Published: (2025)

ABC: Achieving Better Control of Multimodal Embeddings using VLMs
by: Schneider, Benjamin, et al.
Published: (2025)

Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
by: Bhatt, Neel P., et al.
Published: (2024)

Reconstruction Alignment Improves Unified Multimodal Models
by: Xie, Ji, et al.
Published: (2025)

A Survey of Video Datasets for Grounded Event Understanding
by: Sanders, Kate, et al.
Published: (2024)

MoLAN: A Unified Modality-Aware Noise Dynamic Editing Framework for Multimodal Sentiment Analysis
by: Xu, Xingle, et al.
Published: (2025)

Towards Faithful Multimodal Concept Bottleneck Models
by: Moreau, Pierre, et al.
Published: (2026)

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
by: Zhan, Jun, et al.
Published: (2024)