Saved in:
| Main Authors: | Bezirganyan, Grigor, Sellami, Sana, Berti-Équille, Laure, Fournier, Sébastien |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.09864 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion
by: Bezirganyan, Grigor, et al.
Published: (2024)
by: Bezirganyan, Grigor, et al.
Published: (2024)
MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning
by: Chergui, Abdelmadjid, et al.
Published: (2024)
by: Chergui, Abdelmadjid, et al.
Published: (2024)
Hierarchical Classification for Automated Image Annotation of Coral Reef Benthic Structures
by: Blondin, Célia, et al.
Published: (2024)
by: Blondin, Célia, et al.
Published: (2024)
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
by: Wang, Hengyi, et al.
Published: (2024)
by: Wang, Hengyi, et al.
Published: (2024)
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks
by: Hong, Jindong, et al.
Published: (2025)
by: Hong, Jindong, et al.
Published: (2025)
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
by: Xiao, Yijia, et al.
Published: (2024)
by: Xiao, Yijia, et al.
Published: (2024)
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
by: Zou, Henry Peng, et al.
Published: (2024)
by: Zou, Henry Peng, et al.
Published: (2024)
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
by: Wang, Chenguang, et al.
Published: (2024)
by: Wang, Chenguang, et al.
Published: (2024)
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
by: Cai, Mu, et al.
Published: (2024)
by: Cai, Mu, et al.
Published: (2024)
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
by: Lin, Dongyan, et al.
Published: (2026)
by: Lin, Dongyan, et al.
Published: (2026)
Salsa as a Nonverbal Embodied Language -- The CoMPAS3D Dataset and Benchmarks
by: Burkanova, Bermet, et al.
Published: (2025)
by: Burkanova, Bermet, et al.
Published: (2025)
Explingo: Explaining AI Predictions using Large Language Models
by: Zytek, Alexandra, et al.
Published: (2024)
by: Zytek, Alexandra, et al.
Published: (2024)
MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
by: Dongre, Vardhan, et al.
Published: (2025)
by: Dongre, Vardhan, et al.
Published: (2025)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning
by: Cai, Zikui, et al.
Published: (2025)
by: Cai, Zikui, et al.
Published: (2025)
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
by: Huang, Brandon, et al.
Published: (2024)
by: Huang, Brandon, et al.
Published: (2024)
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
by: Zeng, Yu, et al.
Published: (2026)
by: Zeng, Yu, et al.
Published: (2026)
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
by: Huybrechts, Goeric, et al.
Published: (2025)
by: Huybrechts, Goeric, et al.
Published: (2025)
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
by: Luo, Yaxin, et al.
Published: (2025)
by: Luo, Yaxin, et al.
Published: (2025)
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
by: Chandu, Khyathi Raghavi, et al.
Published: (2024)
by: Chandu, Khyathi Raghavi, et al.
Published: (2024)
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Learning to Steer: Input-dependent Steering for Multimodal LLMs
by: Parekh, Jayneel, et al.
Published: (2025)
by: Parekh, Jayneel, et al.
Published: (2025)
Many-Shot In-Context Learning in Multimodal Foundation Models
by: Jiang, Yixing, et al.
Published: (2024)
by: Jiang, Yixing, et al.
Published: (2024)
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
by: Wang, Chenxi, et al.
Published: (2025)
by: Wang, Chenxi, et al.
Published: (2025)
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)
by: Chen, Yi, et al.
Published: (2025)
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
by: Wei, Lai, et al.
Published: (2025)
by: Wei, Lai, et al.
Published: (2025)
A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data
by: Wu, Yifan, et al.
Published: (2024)
by: Wu, Yifan, et al.
Published: (2024)
Judge Model for Large-scale Multimodality Benchmarks
by: Shih, Min-Han, et al.
Published: (2026)
by: Shih, Min-Han, et al.
Published: (2026)
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
by: Wang, Ke, et al.
Published: (2024)
by: Wang, Ke, et al.
Published: (2024)
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
by: Yoon, Hyungjun, et al.
Published: (2024)
by: Yoon, Hyungjun, et al.
Published: (2024)
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
by: Chen, Shuang, et al.
Published: (2025)
by: Chen, Shuang, et al.
Published: (2025)
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
by: Xu, Nan, et al.
Published: (2024)
by: Xu, Nan, et al.
Published: (2024)
Benchmarking Multimodal Large Language Models for Face Recognition
by: Shahreza, Hatef Otroshi, et al.
Published: (2025)
by: Shahreza, Hatef Otroshi, et al.
Published: (2025)
R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?
by: Zhang, Jingyi, et al.
Published: (2026)
by: Zhang, Jingyi, et al.
Published: (2026)
Set-CLIP: Exploring Aligned Semantic From Low-Alignment Multimodal Data Through A Distribution View
by: Song, Zijia, et al.
Published: (2024)
by: Song, Zijia, et al.
Published: (2024)
Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis
by: Yao, Ziyan, et al.
Published: (2024)
by: Yao, Ziyan, et al.
Published: (2024)
Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
by: Chen, Kun, et al.
Published: (2025)
by: Chen, Kun, et al.
Published: (2025)
Exploring Curriculum Learning for Vision-Language Tasks: A Study on Small-Scale Multimodal Training
by: Saha, Rohan, et al.
Published: (2024)
by: Saha, Rohan, et al.
Published: (2024)
Similar Items
-
Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion
by: Bezirganyan, Grigor, et al.
Published: (2024) -
MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning
by: Chergui, Abdelmadjid, et al.
Published: (2024) -
Hierarchical Classification for Automated Image Annotation of Coral Reef Benthic Structures
by: Blondin, Célia, et al.
Published: (2024) -
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
by: Wang, Hengyi, et al.
Published: (2024) -
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
by: Wang, Hao, et al.
Published: (2025)