:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lei, Jingyu, Wang, Gaoang, Lee, Der-Horng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.14072
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024)

SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion
by: Yang, Zhiwen, et al.
Published: (2025)

Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey
by: Wang, Gaoang, et al.
Published: (2022)

Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment
by: Xu, Rui, et al.
Published: (2025)

MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)

Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)

Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs
by: Che, Liwei, et al.
Published: (2025)

Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)

HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing
by: Akl, Ahmed, et al.
Published: (2026)

IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
by: Sun, Zhichao, et al.
Published: (2026)

Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)

Self-Improving Small Object Grounding in LVLMs
by: Yang, Tianze, et al.
Published: (2026)

That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation
by: Pramatarov, Georgi, et al.
Published: (2024)

Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field
by: Hu, Wenhao, et al.
Published: (2025)

Mitigating Object Hallucinations in LVLMs via Attention Imbalance Rectification
by: Sun, Han, et al.
Published: (2026)

Lossless Token Merging Even Without Fine-Tuning in Vision Transformers
by: Lee, Jaeyeon, et al.
Published: (2025)

Disjoint Contrastive Regression Learning for Multi-Sourced Annotations
by: Ruan, Xiaoqian, et al.
Published: (2021)

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
by: Kan, Zhehan, et al.
Published: (2024)

DynaHOI: Benchmarking Hand-Object Interaction for Dynamic Target
by: Hu, BoCheng, et al.
Published: (2026)

Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination
by: Chen, Yangneng, et al.
Published: (2026)

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)

Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object Completion
by: Hu, Qingguo, et al.
Published: (2025)

Local Representative Token Guided Merging for Text-to-Image Generation
by: Lee, Min-Jeong, et al.
Published: (2025)

Object-centric Video Question Answering with Visual Grounding and Referring
by: Wang, Haochen, et al.
Published: (2025)

R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs
by: Xie, Jiahao, et al.
Published: (2026)

Rethinking Image-to-Video Adaptation: An Object-centric Perspective
by: Qian, Rui, et al.
Published: (2024)

ToSA: Token Merging with Spatial Awareness
by: Huang, Hsiang-Wei, et al.
Published: (2025)

Video, How Do Your Tokens Merge?
by: Pollard, Sam, et al.
Published: (2025)

Sequential Token Merging: Revisiting Hidden States
by: Wen, Yan, et al.
Published: (2025)

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
by: Zhang, Jie, et al.
Published: (2024)

DSG-World: Learning a 3D Gaussian World Model from Dual State Videos
by: Hu, Wenhao, et al.
Published: (2025)

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
by: Liu, Yun, et al.
Published: (2024)

Modality Bias in LVLMs: Analyzing and Mitigating Object Hallucination via Attention Lens
by: Zheng, Haohan, et al.
Published: (2025)

MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
by: Jin, Xin, et al.
Published: (2025)

Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs
by: Yu, Liu, et al.
Published: (2025)

Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration
by: Fang, Haipeng, et al.
Published: (2025)

HTTM: Head-wise Temporal Token Merging for Faster VGGT
by: Wang, Weitian, et al.
Published: (2025)

Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes
by: Jeong, Seunghoon, et al.
Published: (2026)

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)

Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)