Saved in:
| Main Authors: | Shi, Dachuan, Tao, Chaofan, Rao, Anyi, Yang, Zhendong, Yuan, Chun, Wang, Jiaqi |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.17455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024)
by: Huang, Qidong, et al.
Published: (2024)
Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention
by: Zhang, Zhendong
Published: (2025)
by: Zhang, Zhendong
Published: (2025)
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
by: Cao, Jianjian, et al.
Published: (2024)
by: Cao, Jianjian, et al.
Published: (2024)
Cross-Modal Adapter for Vision-Language Retrieval
by: Jiang, Haojun, et al.
Published: (2022)
by: Jiang, Haojun, et al.
Published: (2022)
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)
by: Jiang, Lei, et al.
Published: (2025)
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
by: Dagan, Gautier, et al.
Published: (2024)
by: Dagan, Gautier, et al.
Published: (2024)
Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models
by: Zhu, Tinghui, et al.
Published: (2024)
by: Zhu, Tinghui, et al.
Published: (2024)
Are Vision Language Models Cross-Cultural Theory of Mind Reasoners?
by: Nazi, Zabir Al, et al.
Published: (2025)
by: Nazi, Zabir Al, et al.
Published: (2025)
Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect
by: Kouwenhoven, Tom, et al.
Published: (2025)
by: Kouwenhoven, Tom, et al.
Published: (2025)
No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models
by: Sun, Min Woo, et al.
Published: (2025)
by: Sun, Min Woo, et al.
Published: (2025)
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models
by: Seo, Hoigi, et al.
Published: (2025)
by: Seo, Hoigi, et al.
Published: (2025)
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
by: Wang, Enguang, et al.
Published: (2024)
by: Wang, Enguang, et al.
Published: (2024)
Dynamic Token Reweighting for Robust Vision-Language Models
by: Jiang, Tanqiu, et al.
Published: (2025)
by: Jiang, Tanqiu, et al.
Published: (2025)
Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
Cross-Cultural Value Awareness in Large Vision-Language Models
by: Howard, Phillip, et al.
Published: (2026)
by: Howard, Phillip, et al.
Published: (2026)
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
by: Xing, Long, et al.
Published: (2024)
by: Xing, Long, et al.
Published: (2024)
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
by: Shang, Yuying, et al.
Published: (2024)
by: Shang, Yuying, et al.
Published: (2024)
Anatomical Structure-Guided Medical Vision-Language Pre-training
by: Li, Qingqiu, et al.
Published: (2024)
by: Li, Qingqiu, et al.
Published: (2024)
Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models
by: Shao, Zhenwei, et al.
Published: (2025)
by: Shao, Zhenwei, et al.
Published: (2025)
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
by: Xu, Shicheng, et al.
Published: (2024)
by: Xu, Shicheng, et al.
Published: (2024)
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
by: Lu, Jiaying, et al.
Published: (2023)
by: Lu, Jiaying, et al.
Published: (2023)
CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models
by: Gao, Jianjun, et al.
Published: (2024)
by: Gao, Jianjun, et al.
Published: (2024)
LinguDistill: Recovering Linguistic Ability in Vision-Language Models via Selective Cross-Modal Distillation
by: Irawan, Patrick Amadeus, et al.
Published: (2026)
by: Irawan, Patrick Amadeus, et al.
Published: (2026)
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)
by: Kang, Jialiang, et al.
Published: (2025)
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling
by: Xu, Jiaqi, et al.
Published: (2023)
by: Xu, Jiaqi, et al.
Published: (2023)
Beyond Translation: Cross-Cultural Meme Transcreation with Vision-Language Models
by: Zhao, Yuming, et al.
Published: (2026)
by: Zhao, Yuming, et al.
Published: (2026)
Cultural Awareness in Vision-Language Models: A Cross-Country Exploration
by: Madasu, Avinash, et al.
Published: (2025)
by: Madasu, Avinash, et al.
Published: (2025)
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
by: Chen, Junzhe, et al.
Published: (2024)
by: Chen, Junzhe, et al.
Published: (2024)
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)
by: Zhang, Ce, et al.
Published: (2025)
ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention
by: Liu, Wenjie, et al.
Published: (2026)
by: Liu, Wenjie, et al.
Published: (2026)
Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models
by: Li, Changqun, et al.
Published: (2024)
by: Li, Changqun, et al.
Published: (2024)
Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)
by: Lee, Jungbeom, et al.
Published: (2024)
Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
by: Zhao, Dachuan, et al.
Published: (2025)
by: Zhao, Dachuan, et al.
Published: (2025)
Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models
by: Zhou, Yucheng, et al.
Published: (2024)
by: Zhou, Yucheng, et al.
Published: (2024)
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
by: Chen, Liang, et al.
Published: (2024)
by: Chen, Liang, et al.
Published: (2024)
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
by: Han, Feng, et al.
Published: (2026)
by: Han, Feng, et al.
Published: (2026)
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention
by: Ye, Zekai, et al.
Published: (2025)
by: Ye, Zekai, et al.
Published: (2025)
Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models
by: Zhang, Yikai, et al.
Published: (2024)
by: Zhang, Yikai, et al.
Published: (2024)
Similar Items
-
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024) -
Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention
by: Zhang, Zhendong
Published: (2025) -
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
by: Cao, Jianjian, et al.
Published: (2024) -
Cross-Modal Adapter for Vision-Language Retrieval
by: Jiang, Haojun, et al.
Published: (2022) -
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)