Saved in:
| Main Authors: | Agrawal, Aviral, Lezcano, Carlos Mateo Samudio, Heredia-Marin, Iqui Balam, Sethi, Prabhdeep Singh |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.13530 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
No Training Wheels: Steering Vectors for Bias Correction at Inference Time
by: Gupta, Aviral, et al.
Published: (2025)
by: Gupta, Aviral, et al.
Published: (2025)
StyleSplat: 3D Object Style Transfer with Gaussian Splatting
by: Jain, Sahil, et al.
Published: (2024)
by: Jain, Sahil, et al.
Published: (2024)
VidLA: Video-Language Alignment at Scale
by: Rizve, Mamshad Nayeem, et al.
Published: (2024)
by: Rizve, Mamshad Nayeem, et al.
Published: (2024)
Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025)
by: Chen, Weixing, et al.
Published: (2025)
Seeing Syntax: Uncovering Syntactic Learning Limitations in Vision-Language Models
by: Dumpala, Sri Harsha, et al.
Published: (2024)
by: Dumpala, Sri Harsha, et al.
Published: (2024)
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
by: Park, Jonggwon, et al.
Published: (2025)
by: Park, Jonggwon, et al.
Published: (2025)
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
by: Góral, Gracjan, et al.
Published: (2024)
by: Góral, Gracjan, et al.
Published: (2024)
VisMin: Visual Minimal-Change Understanding
by: Awal, Rabiul, et al.
Published: (2024)
by: Awal, Rabiul, et al.
Published: (2024)
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
by: Xiao, Xin, et al.
Published: (2024)
by: Xiao, Xin, et al.
Published: (2024)
Reinforced Attention Learning
by: Li, Bangzheng, et al.
Published: (2026)
by: Li, Bangzheng, et al.
Published: (2026)
RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
by: Biswas, Subrata, et al.
Published: (2025)
by: Biswas, Subrata, et al.
Published: (2025)
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
by: Ahmadi, Saba, et al.
Published: (2023)
by: Ahmadi, Saba, et al.
Published: (2023)
Unifying Specialized Visual Encoders for Video Language Models
by: Chung, Jihoon, et al.
Published: (2025)
by: Chung, Jihoon, et al.
Published: (2025)
Can Visual Encoder Learn to See Arrows?
by: Terashita, Naoyuki, et al.
Published: (2025)
by: Terashita, Naoyuki, et al.
Published: (2025)
Text-centric Alignment for Multi-Modality Learning
by: Tsai, Yun-Da, et al.
Published: (2024)
by: Tsai, Yun-Da, et al.
Published: (2024)
Phrase-Instance Alignment for Generalized Referring Segmentation
by: Nguyen, E-Ro, et al.
Published: (2024)
by: Nguyen, E-Ro, et al.
Published: (2024)
Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
by: Li, Jingru, et al.
Published: (2026)
by: Li, Jingru, et al.
Published: (2026)
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
by: Ghazanfari, Sara, et al.
Published: (2024)
by: Ghazanfari, Sara, et al.
Published: (2024)
Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)
by: Paischer, Fabian, et al.
Published: (2023)
Attribute Diversity Determines the Systematicity Gap in VQA
by: Berlot-Attwell, Ian, et al.
Published: (2023)
by: Berlot-Attwell, Ian, et al.
Published: (2023)
Transformer with Controlled Attention for Synchronous Motion Captioning
by: Radouane, Karim, et al.
Published: (2024)
by: Radouane, Karim, et al.
Published: (2024)
X-VILA: Cross-Modality Alignment for Large Language Model
by: Ye, Hanrong, et al.
Published: (2024)
by: Ye, Hanrong, et al.
Published: (2024)
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
by: Gadgil, Soham, et al.
Published: (2024)
by: Gadgil, Soham, et al.
Published: (2024)
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
by: Shukor, Mustafa, et al.
Published: (2024)
by: Shukor, Mustafa, et al.
Published: (2024)
Omnimodal Dataset Distillation via High-order Proxy Alignment
by: Gao, Yuxuan, et al.
Published: (2026)
by: Gao, Yuxuan, et al.
Published: (2026)
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
by: Khorrami, Khazar, et al.
Published: (2021)
by: Khorrami, Khazar, et al.
Published: (2021)
Improving Automatic VQA Evaluation Using Large Language Models
by: Mañas, Oscar, et al.
Published: (2023)
by: Mañas, Oscar, et al.
Published: (2023)
Resolving Spatio-Temporal Entanglement in Video Prediction via Multi-Modal Attention
by: Gupta, Shreyam, et al.
Published: (2025)
by: Gupta, Shreyam, et al.
Published: (2025)
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
by: Basile, Lorenzo, et al.
Published: (2025)
by: Basile, Lorenzo, et al.
Published: (2025)
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
by: Yamada, Yoshihiro
Published: (2025)
by: Yamada, Yoshihiro
Published: (2025)
Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization
by: Song, Yuhang, et al.
Published: (2024)
by: Song, Yuhang, et al.
Published: (2024)
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning
by: Piergiovanni, AJ, et al.
Published: (2024)
by: Piergiovanni, AJ, et al.
Published: (2024)
The ART of Composition: Attention-Regularized Training for Compositional Visual Grounding
by: Luo, Jiayun, et al.
Published: (2024)
by: Luo, Jiayun, et al.
Published: (2024)
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
by: Le, Quang-Hung, et al.
Published: (2024)
by: Le, Quang-Hung, et al.
Published: (2024)
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
by: Wu, Haoning, et al.
Published: (2024)
by: Wu, Haoning, et al.
Published: (2024)
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
by: Yoon, Eunseop, et al.
Published: (2025)
by: Yoon, Eunseop, et al.
Published: (2025)
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
by: Chen, Menglan, et al.
Published: (2025)
by: Chen, Menglan, et al.
Published: (2025)
Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity
by: Shuai, Zitao, et al.
Published: (2024)
by: Shuai, Zitao, et al.
Published: (2024)
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
by: Fei, Hao, et al.
Published: (2024)
by: Fei, Hao, et al.
Published: (2024)
Similar Items
-
No Training Wheels: Steering Vectors for Bias Correction at Inference Time
by: Gupta, Aviral, et al.
Published: (2025) -
StyleSplat: 3D Object Style Transfer with Gaussian Splatting
by: Jain, Sahil, et al.
Published: (2024) -
VidLA: Video-Language Alignment at Scale
by: Rizve, Mamshad Nayeem, et al.
Published: (2024) -
Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025) -
Seeing Syntax: Uncovering Syntactic Learning Limitations in Vision-Language Models
by: Dumpala, Sri Harsha, et al.
Published: (2024)