Saved in:
| Main Authors: | Zhang, Dengjia, Martin, Alexander, Jurayj, William, Murray, Kenton, Van Durme, Benjamin, Kriz, Reno |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.08701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HLTCOE Evaluation Team at TREC 2025: VQA Track
by: Zhang, Dengjia, et al.
Published: (2025)
by: Zhang, Dengjia, et al.
Published: (2025)
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025)
by: Martin, Alexander, et al.
Published: (2025)
RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval
by: Skow, Tyler, et al.
Published: (2026)
by: Skow, Tyler, et al.
Published: (2026)
MARQUIS: A Three-Stage Pipeline for Video Retrieval-Augmented Generation
by: Chakraborty, Debashish, et al.
Published: (2026)
by: Chakraborty, Debashish, et al.
Published: (2026)
Grounding Partially-Defined Events in Multimodal Data
by: Sanders, Kate, et al.
Published: (2024)
by: Sanders, Kate, et al.
Published: (2024)
Multi-Vector Index Compression in Any Modality
by: Qin, Hanxiang, et al.
Published: (2026)
by: Qin, Hanxiang, et al.
Published: (2026)
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
by: Reddy, Arun, et al.
Published: (2025)
by: Reddy, Arun, et al.
Published: (2025)
WikiVideo: Article Generation from Multiple Videos
by: Martin, Alexander, et al.
Published: (2025)
by: Martin, Alexander, et al.
Published: (2025)
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
by: Kriz, Reno, et al.
Published: (2024)
by: Kriz, Reno, et al.
Published: (2024)
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
by: Hamed, Omar, et al.
Published: (2024)
by: Hamed, Omar, et al.
Published: (2024)
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
by: Samuel, Saron, et al.
Published: (2025)
by: Samuel, Saron, et al.
Published: (2025)
Unified Multimodal Discrete Diffusion
by: Swerdlow, Alexander, et al.
Published: (2025)
by: Swerdlow, Alexander, et al.
Published: (2025)
A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry
by: Zhu, Jingsen, et al.
Published: (2026)
by: Zhu, Jingsen, et al.
Published: (2026)
Semantic Residual for Multimodal Unified Discrete Representation
by: Huang, Hai, et al.
Published: (2024)
by: Huang, Hai, et al.
Published: (2024)
LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data
by: Bezirganyan, Grigor, et al.
Published: (2024)
by: Bezirganyan, Grigor, et al.
Published: (2024)
Unified Control for Inference-Time Guidance of Denoising Diffusion Models
by: Goyal, Maurya, et al.
Published: (2025)
by: Goyal, Maurya, et al.
Published: (2025)
Toward Unified Multimodal Representation Learning for Autonomous Driving
by: Tao, Ximeng, et al.
Published: (2026)
by: Tao, Ximeng, et al.
Published: (2026)
Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference
by: Ahmed, Sk Miraj, et al.
Published: (2026)
by: Ahmed, Sk Miraj, et al.
Published: (2026)
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
by: Shen, Ying, et al.
Published: (2026)
by: Shen, Ying, et al.
Published: (2026)
Multimodal Guidance Network for Missing-Modality Inference in Content Moderation
by: Zhao, Zhuokai, et al.
Published: (2023)
by: Zhao, Zhuokai, et al.
Published: (2023)
Learning Multimodal Latent Space with EBM Prior and MCMC Inference
by: Yuan, Shiyu, et al.
Published: (2024)
by: Yuan, Shiyu, et al.
Published: (2024)
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models
by: Yu, Yongsheng, et al.
Published: (2024)
by: Yu, Yongsheng, et al.
Published: (2024)
Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents
by: Veerabadran, Vijay, et al.
Published: (2025)
by: Veerabadran, Vijay, et al.
Published: (2025)
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
by: Li, Zhuowan, et al.
Published: (2022)
by: Li, Zhuowan, et al.
Published: (2022)
The Principle of Uncertain Maximum Entropy
by: Bogert, Kenneth, et al.
Published: (2023)
by: Bogert, Kenneth, et al.
Published: (2023)
All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding
by: Rahman, Tanzila, et al.
Published: (2026)
by: Rahman, Tanzila, et al.
Published: (2026)
NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation
by: Joshi, Devvrat, et al.
Published: (2025)
by: Joshi, Devvrat, et al.
Published: (2025)
Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
by: Moradinasab, Nazanin, et al.
Published: (2025)
by: Moradinasab, Nazanin, et al.
Published: (2025)
Process Supervision of Confidence Margin for Calibrated LLM Reasoning
by: Wang, Liaoyaqi, et al.
Published: (2026)
by: Wang, Liaoyaqi, et al.
Published: (2026)
Metric Unreliability in Multimodal Machine Unlearning: A Systematic Analysis and Principled Unified Score
by: Khan, Abdullah Ahmad, et al.
Published: (2026)
by: Khan, Abdullah Ahmad, et al.
Published: (2026)
EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving
by: Palladino, Vittorio, et al.
Published: (2026)
by: Palladino, Vittorio, et al.
Published: (2026)
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
by: Li, Yanghao, et al.
Published: (2025)
by: Li, Yanghao, et al.
Published: (2025)
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
by: Li, Bingxin
Published: (2025)
by: Li, Bingxin
Published: (2025)
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
by: Schneider, Benjamin, et al.
Published: (2025)
by: Schneider, Benjamin, et al.
Published: (2025)
Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
by: Bhatt, Neel P., et al.
Published: (2024)
by: Bhatt, Neel P., et al.
Published: (2024)
Reconstruction Alignment Improves Unified Multimodal Models
by: Xie, Ji, et al.
Published: (2025)
by: Xie, Ji, et al.
Published: (2025)
A Survey of Video Datasets for Grounded Event Understanding
by: Sanders, Kate, et al.
Published: (2024)
by: Sanders, Kate, et al.
Published: (2024)
MoLAN: A Unified Modality-Aware Noise Dynamic Editing Framework for Multimodal Sentiment Analysis
by: Xu, Xingle, et al.
Published: (2025)
by: Xu, Xingle, et al.
Published: (2025)
Towards Faithful Multimodal Concept Bottleneck Models
by: Moreau, Pierre, et al.
Published: (2026)
by: Moreau, Pierre, et al.
Published: (2026)
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
by: Zhan, Jun, et al.
Published: (2024)
by: Zhan, Jun, et al.
Published: (2024)
Similar Items
-
HLTCOE Evaluation Team at TREC 2025: VQA Track
by: Zhang, Dengjia, et al.
Published: (2025) -
Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation
by: Martin, Alexander, et al.
Published: (2025) -
RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval
by: Skow, Tyler, et al.
Published: (2026) -
MARQUIS: A Three-Stage Pipeline for Video Retrieval-Augmented Generation
by: Chakraborty, Debashish, et al.
Published: (2026) -
Grounding Partially-Defined Events in Multimodal Data
by: Sanders, Kate, et al.
Published: (2024)