:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gröger, Fabian, Wen, Shuo, Le, Huyen, Brbić, Maria
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2506.16895
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Revisiting the Platonic Representation Hypothesis: An Aristotelian View
by: Gröger, Fabian, et al.
Published: (2026)

Is Hyperbolic Space All You Need for Medical Anomaly Detection?
by: Gonzalez-Jimenez, Alvaro, et al.
Published: (2025)

Learning Relative Representations for Fine-Grained Multimodal Alignment with Limited Data
by: Kim, Shiwon, et al.
Published: (2026)

Clinical Uncertainty Impacts Machine Learning Evaluations
by: Lionetti, Simone, et al.
Published: (2025)

Gramian Multimodal Representation Learning and Alignment
by: Cicchetti, Giordano, et al.
Published: (2024)

Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models
by: Lee, Jaa-Yeon, et al.
Published: (2026)

Reconstruction Alignment Improves Unified Multimodal Models
by: Xie, Ji, et al.
Published: (2025)

CleanPatrick: A Benchmark for Image Data Cleaning
by: Gröger, Fabian, et al.
Published: (2025)

MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning
by: Jiang, Yulun, et al.
Published: (2025)

MEGL: Multimodal Explanation-Guided Learning
by: Zhang, Yifei, et al.
Published: (2024)

Exploring Perceptual Limitation of Multimodal Large Language Models
by: Zhang, Jiarui, et al.
Published: (2024)

A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity
by: Cicchetti, Giordano, et al.
Published: (2025)

Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
by: Chatterjee, Abhiroop, et al.
Published: (2025)

Align Your Query: Representation Alignment for Multimodality Medical Object Detection
by: Seo, Ara, et al.
Published: (2025)

GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography
by: Du, Yuexi, et al.
Published: (2025)

Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models
by: Guo, Xuyang, et al.
Published: (2025)

Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning?
by: Chen, Shuo, et al.
Published: (2023)

Learning from Limited and Imperfect Data
by: Rangwani, Harsh
Published: (2025)

Visual Planning: Let's Think Only with Images
by: Xu, Yi, et al.
Published: (2025)

Free$^2$Guide: Training-Free Text-to-Video Alignment using Image LVLM
by: Kim, Jaemin, et al.
Published: (2024)

MTA: Multimodal Task Alignment for BEV Perception and Captioning
by: Ma, Yunsheng, et al.
Published: (2024)

Reasoning Limitations of Multimodal Large Language Models. A Case Study of Bongard Problems
by: Małkiński, Mikołaj, et al.
Published: (2024)

Decoupling Semantic Similarity from Spatial Alignment for Neural Networks
by: Wald, Tassilo, et al.
Published: (2024)

Set-CLIP: Exploring Aligned Semantic From Low-Alignment Multimodal Data Through A Distribution View
by: Song, Zijia, et al.
Published: (2024)

Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization
by: Xu, Guoyang, et al.
Published: (2024)

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models
by: Miao, Yanting, et al.
Published: (2026)

Towards Achieving Perfect Multimodal Alignment
by: Kamboj, Abhi, et al.
Published: (2025)

What You See is What You Classify: Black Box Attributions
by: Stalder, Steven, et al.
Published: (2022)

Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)

EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
by: Wang, Yongxin, et al.
Published: (2024)

QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
by: Kao, Kuei-Chun, et al.
Published: (2025)

VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
by: Palaskar, Shruti, et al.
Published: (2025)

Multi-Surrogate-Teacher Assistance for Representation Alignment in Fingerprint-based Indoor Localization
by: Nguyen, Son Minh, et al.
Published: (2024)

Fine-grained Classes and How to Find Them
by: Grcić, Matej, et al.
Published: (2024)

VOILA: Value-of-Information Guided Fidelity Selection for Cost-Aware Multimodal Question Answering
by: Bhope, Rahul Atul, et al.
Published: (2026)

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
by: Liu, Jihao, et al.
Published: (2024)

Motivation is Something You Need
by: Acheli, Mehdi, et al.
Published: (2026)

Data-Efficient Multimodal Fusion on a Single GPU
by: Vouitsis, Noël, et al.
Published: (2023)

Local Lesion Generation is Effective for Capsule Endoscopy Image Data Augmentation in a Limited Data Setting
by: Chłopowiec, Adrian B., et al.
Published: (2024)

Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models
by: Agarwal, Sakshi, et al.
Published: (2026)