:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Lin, Ni, Bolin, Yang, Qi, Wang, Zili, Ding, Kun, Wang, Ying, Peng, Houwen, Xiang, Shiming
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.10863
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
by: Yang, Qi, et al.
Published: (2025)

Continuous Speculative Decoding for Autoregressive Image Generation
by: Wang, Zili, et al.
Published: (2024)

Taming Modality Entanglement in Continual Audio-Visual Segmentation
by: Hong, Yuyang, et al.
Published: (2025)

EvoVLMA: Evolutionary Vision-Language Model Adaptation
by: Ding, Kun, et al.
Published: (2025)

A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem
by: Ding, Kun, et al.
Published: (2024)

Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
by: Chen, Lin, et al.
Published: (2026)

SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
by: Chen, Lin, et al.
Published: (2025)

AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation
by: Wang, Zili, et al.
Published: (2024)

Defying Imbalanced Forgetting in Class Incremental Learning
by: Xu, Shixiong, et al.
Published: (2024)

Compositional Kronecker Context Optimization for Vision-Language Models
by: Ding, Kun, et al.
Published: (2024)

Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
by: Ding, Kun, et al.
Published: (2024)

Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
by: Chen, Xin, et al.
Published: (2023)

Enhancing Visual Continual Learning with Language-Guided Supervision
by: Ni, Bolin, et al.
Published: (2024)

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)

WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)

Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
by: Chen, Lin, et al.
Published: (2025)

Beyond Perceptual Distances: Rethinking Disparity Assessment for Out-of-Distribution Detection with Diffusion Models
by: Fang, Kun, et al.
Published: (2024)

CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering
by: Hong, Yuyang, et al.
Published: (2026)

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
by: Hong, Yuyang, et al.
Published: (2025)

Hyperbolic Chamfer Distance for Point Cloud Completion and Beyond
by: Lin, Fangzhou, et al.
Published: (2024)

DSFC-Net: A Dual-Encoder Spatial and Frequency Co-Awareness Network for Rural Road Extraction
by: Zhang, Zhengbo, et al.
Published: (2026)

SeaVIS: Sound-Enhanced Association for Online Audio-Visual Instance Segmentation
by: Zhu, Yingjian, et al.
Published: (2026)

Prompt Tuning with Soft Context Sharing for Vision-Language Models
by: Ding, Kun, et al.
Published: (2022)

Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
by: Ding, Kun, et al.
Published: (2024)

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
by: Yang, Qi, et al.
Published: (2024)

Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
by: Yang, Qi, et al.
Published: (2025)

UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
by: Zhang, Tao, et al.
Published: (2025)

MINIMA: Modality Invariant Image Matching
by: Ren, Jiangwei, et al.
Published: (2024)

Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes
by: Lu, Yujie, et al.
Published: (2024)

Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature
by: Shen, Lingdong, et al.
Published: (2024)

Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction
by: Jia, Bo, et al.
Published: (2025)

Diffusion-based Radiotherapy Dose Prediction Guided by Inter-slice Aware Structure Encoding
by: Feng, Zhenghao, et al.
Published: (2023)

GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
by: Sun, Chengsong, et al.
Published: (2025)

SeqPE: Transformer with Sequential Position Encoding
by: Li, Huayang, et al.
Published: (2025)

Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos
by: Liu, Jiahe, et al.
Published: (2024)

Robust Zero Level-Set Extraction from Unsigned Distance Fields Based on Double Covering
by: Hou, Fei, et al.
Published: (2023)

Beyond Chamfer Distance: Granular Order-aware Evaluation Metric For Online Mapping
by: Lehocine, Chouaib Bencheikh, et al.
Published: (2026)

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
by: Zhang, Tao, et al.
Published: (2025)

Gradient Distance Function
by: Le, Hieu, et al.
Published: (2024)

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
by: Deng, Wenxiao, et al.
Published: (2024)