Saved in:
| Main Authors: | Yoon, Lauren Hyoseo, Yue, Yisong, Kim, Been |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.01201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale
by: Koepke, A. Sophia, et al.
Published: (2026)
by: Koepke, A. Sophia, et al.
Published: (2026)
Self-Evolving Visual Concept Library using Vision-Language Critics
by: Sehgal, Atharva, et al.
Published: (2025)
by: Sehgal, Atharva, et al.
Published: (2025)
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
by: Hadgi, Souhail, et al.
Published: (2025)
by: Hadgi, Souhail, et al.
Published: (2025)
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models
by: Wang, Austin, et al.
Published: (2026)
by: Wang, Austin, et al.
Published: (2026)
Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?
by: Yang, Yiwei, et al.
Published: (2025)
by: Yang, Yiwei, et al.
Published: (2025)
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
by: Xing, Shuo, et al.
Published: (2025)
by: Xing, Shuo, et al.
Published: (2025)
High-Fidelity Text-to-Image Generation from Pre-Trained Vision-Language Models via Distribution-Conditioned Diffusion Decoding
by: Hong, Ji Woo, et al.
Published: (2026)
by: Hong, Ji Woo, et al.
Published: (2026)
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
by: Jeong, Wooseong, et al.
Published: (2025)
by: Jeong, Wooseong, et al.
Published: (2025)
Test-Time Training for Visual Foresight Vision-Language-Action Models
by: Park, Sangwu, et al.
Published: (2026)
by: Park, Sangwu, et al.
Published: (2026)
Unsupervised Representation Learning from Sparse Transformation Analysis
by: Song, Yue, et al.
Published: (2024)
by: Song, Yue, et al.
Published: (2024)
Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents
by: Kim, Beomsu, et al.
Published: (2025)
by: Kim, Beomsu, et al.
Published: (2025)
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models
by: Park, NaHyeon, et al.
Published: (2025)
by: Park, NaHyeon, et al.
Published: (2025)
Interactive Post-Training for Vision-Language-Action Models
by: Tan, Shuhan, et al.
Published: (2025)
by: Tan, Shuhan, et al.
Published: (2025)
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
by: Ding, Yi, et al.
Published: (2024)
by: Ding, Yi, et al.
Published: (2024)
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
by: Liu, Xinyang, et al.
Published: (2023)
by: Liu, Xinyang, et al.
Published: (2023)
Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks
by: Shandilya, Utkarsh, et al.
Published: (2025)
by: Shandilya, Utkarsh, et al.
Published: (2025)
Split Gibbs Discrete Diffusion Posterior Sampling
by: Chu, Wenda, et al.
Published: (2025)
by: Chu, Wenda, et al.
Published: (2025)
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
by: Zhou, Yiyang, et al.
Published: (2024)
by: Zhou, Yiyang, et al.
Published: (2024)
Continual Learning in Vision-Language Models via Aligned Model Merging
by: Sokar, Ghada, et al.
Published: (2025)
by: Sokar, Ghada, et al.
Published: (2025)
Vision Language Models are Biased
by: Vo, An, et al.
Published: (2025)
by: Vo, An, et al.
Published: (2025)
Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
by: Hoang, Dung Anh, et al.
Published: (2026)
by: Hoang, Dung Anh, et al.
Published: (2026)
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
by: Srivastava, Divyansh, et al.
Published: (2024)
by: Srivastava, Divyansh, et al.
Published: (2024)
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
by: Chen, Yangyi, et al.
Published: (2023)
by: Chen, Yangyi, et al.
Published: (2023)
Reinforcement Learning Friendly Vision-Language Model for Minecraft
by: Jiang, Haobin, et al.
Published: (2023)
by: Jiang, Haobin, et al.
Published: (2023)
Kuramoto Orientation Diffusion Models
by: Song, Yue, et al.
Published: (2025)
by: Song, Yue, et al.
Published: (2025)
Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach
by: Wang, Haoxuan, et al.
Published: (2020)
by: Wang, Haoxuan, et al.
Published: (2020)
Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
by: Zheng, Shunjie-Fabian, et al.
Published: (2025)
by: Zheng, Shunjie-Fabian, et al.
Published: (2025)
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
by: Park, Jonggwon, et al.
Published: (2025)
by: Park, Jonggwon, et al.
Published: (2025)
S-GRPO: Unified Post-Training for Large Vision-Language Models
by: Yan, Yuming, et al.
Published: (2026)
by: Yan, Yuming, et al.
Published: (2026)
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training
by: Zhang, Wenyu, et al.
Published: (2024)
by: Zhang, Wenyu, et al.
Published: (2024)
Training Feature Attribution for Vision Models
by: Bacha, Aziz, et al.
Published: (2025)
by: Bacha, Aziz, et al.
Published: (2025)
On the Use of Anchoring for Training Vision Models
by: Narayanaswamy, Vivek, et al.
Published: (2024)
by: Narayanaswamy, Vivek, et al.
Published: (2024)
Information Router for Mitigating Modality Dominance in Vision-Language Models
by: Kim, Seulgi, et al.
Published: (2026)
by: Kim, Seulgi, et al.
Published: (2026)
Rethinking Fine-Tuning: Unlocking Hidden Capabilities in Vision-Language Models
by: Zhang, Mingyuan, et al.
Published: (2025)
by: Zhang, Mingyuan, et al.
Published: (2025)
Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models
by: Zhang, Yi, et al.
Published: (2026)
by: Zhang, Yi, et al.
Published: (2026)
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
by: Zhou, Yiyang, et al.
Published: (2023)
by: Zhou, Yiyang, et al.
Published: (2023)
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
by: Yan, Kun, et al.
Published: (2023)
by: Yan, Kun, et al.
Published: (2023)
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks
by: Hu, Yuanze, et al.
Published: (2025)
by: Hu, Yuanze, et al.
Published: (2025)
CAPA: Contribution-Aware Pruning and FFN Approximation for Efficient Large Vision-Language Models
by: Jha, Samyak, et al.
Published: (2026)
by: Jha, Samyak, et al.
Published: (2026)
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression
by: Sarkar, Sreetama, et al.
Published: (2025)
by: Sarkar, Sreetama, et al.
Published: (2025)
Similar Items
-
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale
by: Koepke, A. Sophia, et al.
Published: (2026) -
Self-Evolving Visual Concept Library using Vision-Language Critics
by: Sehgal, Atharva, et al.
Published: (2025) -
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
by: Hadgi, Souhail, et al.
Published: (2025) -
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models
by: Wang, Austin, et al.
Published: (2026) -
Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?
by: Yang, Yiwei, et al.
Published: (2025)