:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Tianyu, Fang, Kechen, Wan, Zihao, Zhang, Kaidong, Zhang, Yicheng, Song, Jun, Zheng, Bo, Yao, Yuan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.15300
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?
by: Fang, Kechen, et al.
Published: (2026)

Dataset Ownership Verification for Pre-trained Masked Models
by: Xie, Yuechen, et al.
Published: (2025)

Should VLMs be Pre-trained with Image Data?
by: Keh, Sedrick, et al.
Published: (2025)

Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
by: Fang, Zhengyao, et al.
Published: (2026)

Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
by: Chen, Xin, et al.
Published: (2025)

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
by: Fang, Zhengru, et al.
Published: (2026)

ApET: Approximation-Error Guided Token Compression for Efficient VLMs
by: Ma, Qiankun, et al.
Published: (2026)

Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation
by: Neau, Maëlic, et al.
Published: (2025)

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
by: Zhang, Xinyao, et al.
Published: (2026)

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
by: Lu, Shuai, et al.
Published: (2026)

FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models
by: Xu, Wan, et al.
Published: (2023)

UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs
by: Ni, Shuo, et al.
Published: (2026)

AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
by: Zhang, Junyang, et al.
Published: (2025)

Drantal-NeRF: Diffusion-Based Restoration for Anti-aliasing Neural Radiance Field
by: Yang, Ganlin, et al.
Published: (2024)

Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
by: Zhang, Kaidong, et al.
Published: (2023)

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)

Mask Consistency Regularization in Object Removal
by: Yuan, Hua, et al.
Published: (2025)

Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs
by: Zhang, Song, et al.
Published: (2026)

Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)
by: Duan, Yicheng, et al.
Published: (2025)

Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments
by: Takimoto, Yusuke, et al.
Published: (2024)

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
by: Yu, Tianyu, et al.
Published: (2023)

PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector
by: Li, Kaidong, et al.
Published: (2025)

REFINE-CONTROL: A Semi-supervised Distillation Method For Conditional Image Generation
by: Jiang, Yicheng, et al.
Published: (2025)

Clapper: Compact Learning and Video Representation in VLMs
by: Kong, Lingyu, et al.
Published: (2025)

ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs
by: Li, Jiangyang, et al.
Published: (2026)

Hierarchical Granularity Alignment and State Space Modeling for Robust Multimodal AU Detection in the Wild
by: Yu, Jun, et al.
Published: (2026)

DarkHash: A Data-Free Backdoor Attack Against Deep Hashing
by: Zhou, Ziqi, et al.
Published: (2025)

SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
by: Song, Yicheng, et al.
Published: (2025)

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
by: Yuan, Tianyuan, et al.
Published: (2024)

HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective
by: Liu, Pei, et al.
Published: (2024)

Semantic One-Dimensional Tokenizer for Image Reconstruction and Generation
by: Qu, Yunpeng, et al.
Published: (2026)

Towards Interactive Image Inpainting via Sketch Refinement
by: Liu, Chang, et al.
Published: (2023)

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
by: Liu, Chang, et al.
Published: (2024)

LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis
by: Liu, Chang, et al.
Published: (2023)

Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples
by: Zhou, Ziqi, et al.
Published: (2024)

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024)

LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
by: Wang, Ziyi, et al.
Published: (2025)

ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)

AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis
by: She, Dong, et al.
Published: (2026)

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
by: Jiang, Bo, et al.
Published: (2025)