Saved in:
| Main Authors: | Lei, Jingyu, Wang, Gaoang, Lee, Der-Horng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.14072 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024)
by: Hao, Shengyu, et al.
Published: (2024)
SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion
by: Yang, Zhiwen, et al.
Published: (2025)
by: Yang, Zhiwen, et al.
Published: (2025)
Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey
by: Wang, Gaoang, et al.
Published: (2022)
by: Wang, Gaoang, et al.
Published: (2022)
Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment
by: Xu, Rui, et al.
Published: (2025)
by: Xu, Rui, et al.
Published: (2025)
MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)
by: Zhang, Luyuan, et al.
Published: (2026)
Video Token Merging for Long-form Video Understanding
by: Lee, Seon-Ho, et al.
Published: (2024)
by: Lee, Seon-Ho, et al.
Published: (2024)
Hallucinatory Image Tokens: A Training-free EAZY Approach on Detecting and Mitigating Object Hallucinations in LVLMs
by: Che, Liwei, et al.
Published: (2025)
by: Che, Liwei, et al.
Published: (2025)
Reducing Object Hallucination in LVLMs via Emphasizing Image-negative Tokens
by: Shen, Meng, et al.
Published: (2026)
by: Shen, Meng, et al.
Published: (2026)
HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing
by: Akl, Ahmed, et al.
Published: (2026)
by: Akl, Ahmed, et al.
Published: (2026)
IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
by: Sun, Zhichao, et al.
Published: (2026)
by: Sun, Zhichao, et al.
Published: (2026)
Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
Self-Improving Small Object Grounding in LVLMs
by: Yang, Tianze, et al.
Published: (2026)
by: Yang, Tianze, et al.
Published: (2026)
That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation
by: Pramatarov, Georgi, et al.
Published: (2024)
by: Pramatarov, Georgi, et al.
Published: (2024)
Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field
by: Hu, Wenhao, et al.
Published: (2025)
by: Hu, Wenhao, et al.
Published: (2025)
Mitigating Object Hallucinations in LVLMs via Attention Imbalance Rectification
by: Sun, Han, et al.
Published: (2026)
by: Sun, Han, et al.
Published: (2026)
Lossless Token Merging Even Without Fine-Tuning in Vision Transformers
by: Lee, Jaeyeon, et al.
Published: (2025)
by: Lee, Jaeyeon, et al.
Published: (2025)
Disjoint Contrastive Regression Learning for Multi-Sourced Annotations
by: Ruan, Xiaoqian, et al.
Published: (2021)
by: Ruan, Xiaoqian, et al.
Published: (2021)
CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
by: Kan, Zhehan, et al.
Published: (2024)
by: Kan, Zhehan, et al.
Published: (2024)
DynaHOI: Benchmarking Hand-Object Interaction for Dynamic Target
by: Hu, BoCheng, et al.
Published: (2026)
by: Hu, BoCheng, et al.
Published: (2026)
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination
by: Chen, Yangneng, et al.
Published: (2026)
by: Chen, Yangneng, et al.
Published: (2026)
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
by: Li, Siyuan, et al.
Published: (2025)
by: Li, Siyuan, et al.
Published: (2025)
Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object Completion
by: Hu, Qingguo, et al.
Published: (2025)
by: Hu, Qingguo, et al.
Published: (2025)
Local Representative Token Guided Merging for Text-to-Image Generation
by: Lee, Min-Jeong, et al.
Published: (2025)
by: Lee, Min-Jeong, et al.
Published: (2025)
Object-centric Video Question Answering with Visual Grounding and Referring
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs
by: Xie, Jiahao, et al.
Published: (2026)
by: Xie, Jiahao, et al.
Published: (2026)
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
by: Qian, Rui, et al.
Published: (2024)
by: Qian, Rui, et al.
Published: (2024)
ToSA: Token Merging with Spatial Awareness
by: Huang, Hsiang-Wei, et al.
Published: (2025)
by: Huang, Hsiang-Wei, et al.
Published: (2025)
Video, How Do Your Tokens Merge?
by: Pollard, Sam, et al.
Published: (2025)
by: Pollard, Sam, et al.
Published: (2025)
Sequential Token Merging: Revisiting Hidden States
by: Wen, Yan, et al.
Published: (2025)
by: Wen, Yan, et al.
Published: (2025)
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
by: Zhang, Jie, et al.
Published: (2024)
by: Zhang, Jie, et al.
Published: (2024)
DSG-World: Learning a 3D Gaussian World Model from Dual State Videos
by: Hu, Wenhao, et al.
Published: (2025)
by: Hu, Wenhao, et al.
Published: (2025)
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Modality Bias in LVLMs: Analyzing and Mitigating Object Hallucination via Attention Lens
by: Zheng, Haohan, et al.
Published: (2025)
by: Zheng, Haohan, et al.
Published: (2025)
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
by: Jin, Xin, et al.
Published: (2025)
by: Jin, Xin, et al.
Published: (2025)
Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs
by: Yu, Liu, et al.
Published: (2025)
by: Yu, Liu, et al.
Published: (2025)
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration
by: Fang, Haipeng, et al.
Published: (2025)
by: Fang, Haipeng, et al.
Published: (2025)
HTTM: Head-wise Temporal Token Merging for Faster VGGT
by: Wang, Weitian, et al.
Published: (2025)
by: Wang, Weitian, et al.
Published: (2025)
Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes
by: Jeong, Seunghoon, et al.
Published: (2026)
by: Jeong, Seunghoon, et al.
Published: (2026)
Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)
by: Yang, Longrong, et al.
Published: (2023)
Efficient Visual Transformer by Learnable Token Merging
by: Wang, Yancheng, et al.
Published: (2024)
by: Wang, Yancheng, et al.
Published: (2024)
Similar Items
-
Ego3DT: Tracking Every 3D Object in Ego-centric Videos
by: Hao, Shengyu, et al.
Published: (2024) -
SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion
by: Yang, Zhiwen, et al.
Published: (2025) -
Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey
by: Wang, Gaoang, et al.
Published: (2022) -
Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment
by: Xu, Rui, et al.
Published: (2025) -
MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
by: Zhang, Luyuan, et al.
Published: (2026)