Saved in:
| Main Author: | Zhang, Zhendong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.05947 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding
by: Xiao, Zhongyu, et al.
Published: (2026)
by: Xiao, Zhongyu, et al.
Published: (2026)
Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching
by: Chen, Wenjing
Published: (2024)
by: Chen, Wenjing
Published: (2024)
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
by: Yeo, Wei Jie, et al.
Published: (2025)
by: Yeo, Wei Jie, et al.
Published: (2025)
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
by: Shi, Dachuan, et al.
Published: (2023)
by: Shi, Dachuan, et al.
Published: (2023)
Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
by: Chen, Xinlong, et al.
Published: (2025)
by: Chen, Xinlong, et al.
Published: (2025)
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)
by: Kang, Jialiang, et al.
Published: (2025)
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
by: Basile, Lorenzo, et al.
Published: (2025)
by: Basile, Lorenzo, et al.
Published: (2025)
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
by: Huang, Yushi, et al.
Published: (2025)
by: Huang, Yushi, et al.
Published: (2025)
Flash Window Attention: speedup the attention computation for Swin Transformer
by: Zhang, Zhendong
Published: (2025)
by: Zhang, Zhendong
Published: (2025)
Dual-branch Prompting for Multimodal Machine Translation
by: Wang, Jie, et al.
Published: (2025)
by: Wang, Jie, et al.
Published: (2025)
Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
by: Jia, Sihang, et al.
Published: (2026)
by: Jia, Sihang, et al.
Published: (2026)
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
by: Wang, Yujun, et al.
Published: (2025)
by: Wang, Yujun, et al.
Published: (2025)
Modality Bias in LVLMs: Analyzing and Mitigating Object Hallucination via Attention Lens
by: Zheng, Haohan, et al.
Published: (2025)
by: Zheng, Haohan, et al.
Published: (2025)
APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention
by: Huang, Yuxiang, et al.
Published: (2026)
by: Huang, Yuxiang, et al.
Published: (2026)
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
by: Zhang, Huatian, et al.
Published: (2026)
by: Zhang, Huatian, et al.
Published: (2026)
$\mathcal{V}isi\mathcal{P}runer$: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
by: Fan, Yingqi, et al.
Published: (2025)
by: Fan, Yingqi, et al.
Published: (2025)
MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding
by: Deng, Jingyuan, et al.
Published: (2025)
by: Deng, Jingyuan, et al.
Published: (2025)
Mitigating Object Hallucination via Concentric Causal Attention
by: Xing, Yun, et al.
Published: (2024)
by: Xing, Yun, et al.
Published: (2024)
Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models
by: Shao, Zhenwei, et al.
Published: (2025)
by: Shao, Zhenwei, et al.
Published: (2025)
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
by: Zhao, Tiancheng, et al.
Published: (2024)
by: Zhao, Tiancheng, et al.
Published: (2024)
Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention
by: Xu, Fengyi, et al.
Published: (2025)
by: Xu, Fengyi, et al.
Published: (2025)
KBE-DME: Dynamic Multimodal Evaluation via Knowledge Enhanced Benchmark Evolution
by: Zhang, Junzhe, et al.
Published: (2025)
by: Zhang, Junzhe, et al.
Published: (2025)
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
by: Xing, Long, et al.
Published: (2024)
by: Xing, Long, et al.
Published: (2024)
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
by: Huang, Haoyu, et al.
Published: (2026)
by: Huang, Haoyu, et al.
Published: (2026)
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
by: Zhao, Haozhe, et al.
Published: (2024)
by: Zhao, Haozhe, et al.
Published: (2024)
Vision-Language Models Mistake Head Orientation for Gaze Direction: Nonverbal Conversation Cues
by: Zhang, Zory, et al.
Published: (2025)
by: Zhang, Zory, et al.
Published: (2025)
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
by: Constantinou, Christos, et al.
Published: (2024)
by: Constantinou, Christos, et al.
Published: (2024)
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion
by: Yan, Zhaoyi, et al.
Published: (2025)
by: Yan, Zhaoyi, et al.
Published: (2025)
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
by: He, Jinghan, et al.
Published: (2024)
by: He, Jinghan, et al.
Published: (2024)
CPJ: Explainable Agricultural Pest Diagnosis via Caption-Prompt-Judge with LLM-Judged Refinement
by: Zhang, Wentao, et al.
Published: (2025)
by: Zhang, Wentao, et al.
Published: (2025)
Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention
by: Tian, Changyuan, et al.
Published: (2026)
by: Tian, Changyuan, et al.
Published: (2026)
Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding
by: Ma, Ruiqi, et al.
Published: (2025)
by: Ma, Ruiqi, et al.
Published: (2025)
Modality-Agnostic fMRI Decoding of Vision and Language
by: Nikolaus, Mitja, et al.
Published: (2024)
by: Nikolaus, Mitja, et al.
Published: (2024)
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs
by: Zhang, Xuan, et al.
Published: (2025)
by: Zhang, Xuan, et al.
Published: (2025)
Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)
by: Wang, Xinghao, et al.
Published: (2025)
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)
by: Zhang, Ce, et al.
Published: (2025)
VERA: Identifying and Leveraging Visual Evidence Retrieval Heads in Long-Context Understanding
by: Pei, Rongcan, et al.
Published: (2026)
by: Pei, Rongcan, et al.
Published: (2026)
Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
by: Li, Jingru, et al.
Published: (2026)
by: Li, Jingru, et al.
Published: (2026)
D$^{3}$ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
by: Chang, Shuochen, et al.
Published: (2025)
by: Chang, Shuochen, et al.
Published: (2025)
Dynamic Relation Inference via Verb Embeddings
by: Suissa, Omri, et al.
Published: (2025)
by: Suissa, Omri, et al.
Published: (2025)
Similar Items
-
Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding
by: Xiao, Zhongyu, et al.
Published: (2026) -
Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching
by: Chen, Wenjing
Published: (2024) -
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
by: Yeo, Wei Jie, et al.
Published: (2025) -
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
by: Shi, Dachuan, et al.
Published: (2023) -
Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
by: Chen, Xinlong, et al.
Published: (2025)