Saved in:
| Main Authors: | Gröger, Fabian, Wen, Shuo, Le, Huyen, Brbić, Maria |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.16895 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting the Platonic Representation Hypothesis: An Aristotelian View
by: Gröger, Fabian, et al.
Published: (2026)
by: Gröger, Fabian, et al.
Published: (2026)
Is Hyperbolic Space All You Need for Medical Anomaly Detection?
by: Gonzalez-Jimenez, Alvaro, et al.
Published: (2025)
by: Gonzalez-Jimenez, Alvaro, et al.
Published: (2025)
Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data
by: Kim, Shiwon, et al.
Published: (2026)
by: Kim, Shiwon, et al.
Published: (2026)
Clinical Uncertainty Impacts Machine Learning Evaluations
by: Lionetti, Simone, et al.
Published: (2025)
by: Lionetti, Simone, et al.
Published: (2025)
Gramian Multimodal Representation Learning and Alignment
by: Cicchetti, Giordano, et al.
Published: (2024)
by: Cicchetti, Giordano, et al.
Published: (2024)
Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models
by: Lee, Jaa-Yeon, et al.
Published: (2026)
by: Lee, Jaa-Yeon, et al.
Published: (2026)
Reconstruction Alignment Improves Unified Multimodal Models
by: Xie, Ji, et al.
Published: (2025)
by: Xie, Ji, et al.
Published: (2025)
CleanPatrick: A Benchmark for Image Data Cleaning
by: Gröger, Fabian, et al.
Published: (2025)
by: Gröger, Fabian, et al.
Published: (2025)
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning
by: Jiang, Yulun, et al.
Published: (2025)
by: Jiang, Yulun, et al.
Published: (2025)
MEGL: Multimodal Explanation-Guided Learning
by: Zhang, Yifei, et al.
Published: (2024)
by: Zhang, Yifei, et al.
Published: (2024)
Exploring Perceptual Limitation of Multimodal Large Language Models
by: Zhang, Jiarui, et al.
Published: (2024)
by: Zhang, Jiarui, et al.
Published: (2024)
A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
by: Cicchetti, Giordano, et al.
Published: (2025)
by: Cicchetti, Giordano, et al.
Published: (2025)
Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
by: Chatterjee, Abhiroop, et al.
Published: (2025)
by: Chatterjee, Abhiroop, et al.
Published: (2025)
Align Your Query: Representation Alignment for Multimodality Medical Object Detection
by: Seo, Ara, et al.
Published: (2025)
by: Seo, Ara, et al.
Published: (2025)
GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography
by: Du, Yuexi, et al.
Published: (2025)
by: Du, Yuexi, et al.
Published: (2025)
Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models
by: Guo, Xuyang, et al.
Published: (2025)
by: Guo, Xuyang, et al.
Published: (2025)
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
by: Chen, Shuo, et al.
Published: (2023)
by: Chen, Shuo, et al.
Published: (2023)
Learning from Limited and Imperfect Data
by: Rangwani, Harsh
Published: (2025)
by: Rangwani, Harsh
Published: (2025)
Visual Planning: Let's Think Only with Images
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
Free$^2$Guide: Training-Free Text-to-Video Alignment using Image LVLM
by: Kim, Jaemin, et al.
Published: (2024)
by: Kim, Jaemin, et al.
Published: (2024)
MTA: Multimodal Task Alignment for BEV Perception and Captioning
by: Ma, Yunsheng, et al.
Published: (2024)
by: Ma, Yunsheng, et al.
Published: (2024)
Reasoning Limitations of Multimodal Large Language Models. A Case Study of Bongard Problems
by: Małkiński, Mikołaj, et al.
Published: (2024)
by: Małkiński, Mikołaj, et al.
Published: (2024)
Decoupling Semantic Similarity from Spatial Alignment for Neural Networks
by: Wald, Tassilo, et al.
Published: (2024)
by: Wald, Tassilo, et al.
Published: (2024)
Set-CLIP: Exploring Aligned Semantic From Low-Alignment Multimodal Data Through A Distribution View
by: Song, Zijia, et al.
Published: (2024)
by: Song, Zijia, et al.
Published: (2024)
Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization
by: Xu, Guoyang, et al.
Published: (2024)
by: Xu, Guoyang, et al.
Published: (2024)
Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models
by: Miao, Yanting, et al.
Published: (2026)
by: Miao, Yanting, et al.
Published: (2026)
Towards Achieving Perfect Multimodal Alignment
by: Kamboj, Abhi, et al.
Published: (2025)
by: Kamboj, Abhi, et al.
Published: (2025)
What You See is What You Classify: Black Box Attributions
by: Stalder, Steven, et al.
Published: (2022)
by: Stalder, Steven, et al.
Published: (2022)
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)
by: Lu, Shiyin, et al.
Published: (2024)
EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
by: Wang, Yongxin, et al.
Published: (2024)
by: Wang, Yongxin, et al.
Published: (2024)
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
by: Kao, Kuei-Chun, et al.
Published: (2025)
by: Kao, Kuei-Chun, et al.
Published: (2025)
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
by: Palaskar, Shruti, et al.
Published: (2025)
by: Palaskar, Shruti, et al.
Published: (2025)
Multi-Surrogate-Teacher Assistance for Representation Alignment in Fingerprint-based Indoor Localization
by: Nguyen, Son Minh, et al.
Published: (2024)
by: Nguyen, Son Minh, et al.
Published: (2024)
Fine-grained Classes and How to Find Them
by: Grcić, Matej, et al.
Published: (2024)
by: Grcić, Matej, et al.
Published: (2024)
VOILA: Value-of-Information Guided Fidelity Selection for Cost-Aware Multimodal Question Answering
by: Bhope, Rahul Atul, et al.
Published: (2026)
by: Bhope, Rahul Atul, et al.
Published: (2026)
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
by: Liu, Jihao, et al.
Published: (2024)
by: Liu, Jihao, et al.
Published: (2024)
Motivation is Something You Need
by: Acheli, Mehdi, et al.
Published: (2026)
by: Acheli, Mehdi, et al.
Published: (2026)
Data-Efficient Multimodal Fusion on a Single GPU
by: Vouitsis, Noël, et al.
Published: (2023)
by: Vouitsis, Noël, et al.
Published: (2023)
Local Lesion Generation is Effective for Capsule Endoscopy Image Data Augmentation in a Limited Data Setting
by: Chłopowiec, Adrian B., et al.
Published: (2024)
by: Chłopowiec, Adrian B., et al.
Published: (2024)
Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models
by: Agarwal, Sakshi, et al.
Published: (2026)
by: Agarwal, Sakshi, et al.
Published: (2026)
Similar Items
-
Revisiting the Platonic Representation Hypothesis: An Aristotelian View
by: Gröger, Fabian, et al.
Published: (2026) -
Is Hyperbolic Space All You Need for Medical Anomaly Detection?
by: Gonzalez-Jimenez, Alvaro, et al.
Published: (2025) -
Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data
by: Kim, Shiwon, et al.
Published: (2026) -
Clinical Uncertainty Impacts Machine Learning Evaluations
by: Lionetti, Simone, et al.
Published: (2025) -
Gramian Multimodal Representation Learning and Alignment
by: Cicchetti, Giordano, et al.
Published: (2024)