Saved in:
| Main Authors: | Yu, Tianyu, Fang, Kechen, Wan, Zihao, Zhang, Kaidong, Zhang, Yicheng, Song, Jun, Zheng, Bo, Yao, Yuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.15300 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?
by: Fang, Kechen, et al.
Published: (2026)
by: Fang, Kechen, et al.
Published: (2026)
Dataset Ownership Verification for Pre-trained Masked Models
by: Xie, Yuechen, et al.
Published: (2025)
by: Xie, Yuechen, et al.
Published: (2025)
Should VLMs be Pre-trained with Image Data?
by: Keh, Sedrick, et al.
Published: (2025)
by: Keh, Sedrick, et al.
Published: (2025)
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
by: Fang, Zhengyao, et al.
Published: (2026)
by: Fang, Zhengyao, et al.
Published: (2026)
Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
by: Chen, Xin, et al.
Published: (2025)
by: Chen, Xin, et al.
Published: (2025)
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
by: Fang, Zhengru, et al.
Published: (2026)
by: Fang, Zhengru, et al.
Published: (2026)
ApET: Approximation-Error Guided Token Compression for Efficient VLMs
by: Ma, Qiankun, et al.
Published: (2026)
by: Ma, Qiankun, et al.
Published: (2026)
Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation
by: Neau, Maëlic, et al.
Published: (2025)
by: Neau, Maëlic, et al.
Published: (2025)
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
by: Zhang, Xinyao, et al.
Published: (2026)
by: Zhang, Xinyao, et al.
Published: (2026)
Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
by: Lu, Shuai, et al.
Published: (2026)
by: Lu, Shuai, et al.
Published: (2026)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models
by: Xu, Wan, et al.
Published: (2023)
by: Xu, Wan, et al.
Published: (2023)
UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs
by: Ni, Shuo, et al.
Published: (2026)
by: Ni, Shuo, et al.
Published: (2026)
AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
by: Zhang, Junyang, et al.
Published: (2025)
by: Zhang, Junyang, et al.
Published: (2025)
Drantal-NeRF: Diffusion-Based Restoration for Anti-aliasing Neural Radiance Field
by: Yang, Ganlin, et al.
Published: (2024)
by: Yang, Ganlin, et al.
Published: (2024)
Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
by: Zhang, Kaidong, et al.
Published: (2023)
by: Zhang, Kaidong, et al.
Published: (2023)
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)
by: Zhang, Xintong, et al.
Published: (2025)
Mask Consistency Regularization in Object Removal
by: Yuan, Hua, et al.
Published: (2025)
by: Yuan, Hua, et al.
Published: (2025)
Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs
by: Zhang, Song, et al.
Published: (2026)
by: Zhang, Song, et al.
Published: (2026)
Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)
by: Duan, Yicheng, et al.
Published: (2025)
by: Duan, Yicheng, et al.
Published: (2025)
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments
by: Takimoto, Yusuke, et al.
Published: (2024)
by: Takimoto, Yusuke, et al.
Published: (2024)
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
by: Yu, Tianyu, et al.
Published: (2023)
by: Yu, Tianyu, et al.
Published: (2023)
PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector
by: Li, Kaidong, et al.
Published: (2025)
by: Li, Kaidong, et al.
Published: (2025)
REFINE-CONTROL: A Semi-supervised Distillation Method For Conditional Image Generation
by: Jiang, Yicheng, et al.
Published: (2025)
by: Jiang, Yicheng, et al.
Published: (2025)
Clapper: Compact Learning and Video Representation in VLMs
by: Kong, Lingyu, et al.
Published: (2025)
by: Kong, Lingyu, et al.
Published: (2025)
ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs
by: Li, Jiangyang, et al.
Published: (2026)
by: Li, Jiangyang, et al.
Published: (2026)
Hierarchical Granularity Alignment and State Space Modeling for Robust Multimodal AU Detection in the Wild
by: Yu, Jun, et al.
Published: (2026)
by: Yu, Jun, et al.
Published: (2026)
DarkHash: A Data-Free Backdoor Attack Against Deep Hashing
by: Zhou, Ziqi, et al.
Published: (2025)
by: Zhou, Ziqi, et al.
Published: (2025)
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
by: Song, Yicheng, et al.
Published: (2025)
by: Song, Yicheng, et al.
Published: (2025)
PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
by: Yuan, Tianyuan, et al.
Published: (2024)
by: Yuan, Tianyuan, et al.
Published: (2024)
HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective
by: Liu, Pei, et al.
Published: (2024)
by: Liu, Pei, et al.
Published: (2024)
Semantic One-Dimensional Tokenizer for Image Reconstruction and Generation
by: Qu, Yunpeng, et al.
Published: (2026)
by: Qu, Yunpeng, et al.
Published: (2026)
Towards Interactive Image Inpainting via Sketch Refinement
by: Liu, Chang, et al.
Published: (2023)
by: Liu, Chang, et al.
Published: (2023)
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
by: Liu, Chang, et al.
Published: (2024)
by: Liu, Chang, et al.
Published: (2024)
LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis
by: Liu, Chang, et al.
Published: (2023)
by: Liu, Chang, et al.
Published: (2023)
Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples
by: Zhou, Ziqi, et al.
Published: (2024)
by: Zhou, Ziqi, et al.
Published: (2024)
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024)
by: Qiao, Yuxuan, et al.
Published: (2024)
LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
by: Wang, Ziyi, et al.
Published: (2025)
by: Wang, Ziyi, et al.
Published: (2025)
ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)
by: Han, Dongchen, et al.
Published: (2025)
AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis
by: She, Dong, et al.
Published: (2026)
by: She, Dong, et al.
Published: (2026)
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
Similar Items
-
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?
by: Fang, Kechen, et al.
Published: (2026) -
Dataset Ownership Verification for Pre-trained Masked Models
by: Xie, Yuechen, et al.
Published: (2025) -
Should VLMs be Pre-trained with Image Data?
by: Keh, Sedrick, et al.
Published: (2025) -
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
by: Fang, Zhengyao, et al.
Published: (2026) -
Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?
by: Chen, Xin, et al.
Published: (2025)