:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Zhang, Zhendong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2502.05947
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding
by: Xiao, Zhongyu, et al.
Published: (2026)

Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching
by: Chen, Wenjing
Published: (2024)

Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
by: Yeo, Wei Jie, et al.
Published: (2025)

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
by: Shi, Dachuan, et al.
Published: (2023)

Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
by: Chen, Xinlong, et al.
Published: (2025)

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)

Head Pursuit: Probing Attention Specialization in Multimodal Transformers
by: Basile, Lorenzo, et al.
Published: (2025)

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
by: Huang, Yushi, et al.
Published: (2025)

Flash Window Attention: speedup the attention computation for Swin Transformer
by: Zhang, Zhendong
Published: (2025)

Dual-branch Prompting for Multimodal Machine Translation
by: Wang, Jie, et al.
Published: (2025)

Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
by: Jia, Sihang, et al.
Published: (2026)

ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
by: Wang, Yujun, et al.
Published: (2025)

Modality Bias in LVLMs: Analyzing and Mitigating Object Hallucination via Attention Lens
by: Zheng, Haohan, et al.
Published: (2025)

APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention
by: Huang, Yuxiang, et al.
Published: (2026)

Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
by: Zhang, Huatian, et al.
Published: (2026)

$\mathcal{V}isi\mathcal{P}runer$: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
by: Fan, Yingqi, et al.
Published: (2025)

MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding
by: Deng, Jingyuan, et al.
Published: (2025)

Mitigating Object Hallucination via Concentric Causal Attention
by: Xing, Yun, et al.
Published: (2024)

Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models
by: Shao, Zhenwei, et al.
Published: (2025)

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
by: Zhao, Tiancheng, et al.
Published: (2024)

Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention
by: Xu, Fengyi, et al.
Published: (2025)

KBE-DME: Dynamic Multimodal Evaluation via Knowledge Enhanced Benchmark Evolution
by: Zhang, Junzhe, et al.
Published: (2025)

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
by: Xing, Long, et al.
Published: (2024)

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
by: Huang, Haoyu, et al.
Published: (2026)

Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
by: Zhao, Haozhe, et al.
Published: (2024)

Vision-Language Models Mistake Head Orientation for Gaze Direction: Nonverbal Conversation Cues
by: Zhang, Zory, et al.
Published: (2025)

Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
by: Constantinou, Christos, et al.
Published: (2024)

InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion
by: Yan, Zhaoyi, et al.
Published: (2025)

Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
by: He, Jinghan, et al.
Published: (2024)

CPJ: Explainable Agricultural Pest Diagnosis via Caption-Prompt-Judge with LLM-Judged Refinement
by: Zhang, Wentao, et al.
Published: (2025)

Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention
by: Tian, Changyuan, et al.
Published: (2026)

Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding
by: Ma, Ruiqi, et al.
Published: (2025)

Modality-Agnostic fMRI Decoding of Vision and Language
by: Nikolaus, Mitja, et al.
Published: (2024)

Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs
by: Zhang, Xuan, et al.
Published: (2025)

Sparser Block-Sparse Attention via Token Permutation
by: Wang, Xinghao, et al.
Published: (2025)

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)

VERA: Identifying and Leveraging Visual Evidence Retrieval Heads in Long-Context Understanding
by: Pei, Rongcan, et al.
Published: (2026)

Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
by: Li, Jingru, et al.
Published: (2026)

D$^{3}$ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
by: Chang, Shuochen, et al.
Published: (2025)

Dynamic Relation Inference via Verb Embeddings
by: Suissa, Omri, et al.
Published: (2025)