Saved in:
| Main Authors: | Chen, Lin, Ni, Bolin, Yang, Qi, Wang, Zili, Ding, Kun, Wang, Ying, Peng, Houwen, Xiang, Shiming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.10863 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
by: Yang, Qi, et al.
Published: (2025)
by: Yang, Qi, et al.
Published: (2025)
Continuous Speculative Decoding for Autoregressive Image Generation
by: Wang, Zili, et al.
Published: (2024)
by: Wang, Zili, et al.
Published: (2024)
Taming Modality Entanglement in Continual Audio-Visual Segmentation
by: Hong, Yuyang, et al.
Published: (2025)
by: Hong, Yuyang, et al.
Published: (2025)
EvoVLMA: Evolutionary Vision-Language Model Adaptation
by: Ding, Kun, et al.
Published: (2025)
by: Ding, Kun, et al.
Published: (2025)
A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem
by: Ding, Kun, et al.
Published: (2024)
by: Ding, Kun, et al.
Published: (2024)
Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
by: Chen, Lin, et al.
Published: (2026)
by: Chen, Lin, et al.
Published: (2026)
SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
by: Chen, Lin, et al.
Published: (2025)
by: Chen, Lin, et al.
Published: (2025)
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation
by: Wang, Zili, et al.
Published: (2024)
by: Wang, Zili, et al.
Published: (2024)
Defying Imbalanced Forgetting in Class Incremental Learning
by: Xu, Shixiong, et al.
Published: (2024)
by: Xu, Shixiong, et al.
Published: (2024)
Compositional Kronecker Context Optimization for Vision-Language Models
by: Ding, Kun, et al.
Published: (2024)
by: Ding, Kun, et al.
Published: (2024)
Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
by: Ding, Kun, et al.
Published: (2024)
by: Ding, Kun, et al.
Published: (2024)
Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
by: Chen, Xin, et al.
Published: (2023)
by: Chen, Xin, et al.
Published: (2023)
Enhancing Visual Continual Learning with Language-Guided Supervision
by: Ni, Bolin, et al.
Published: (2024)
by: Ni, Bolin, et al.
Published: (2024)
IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
by: Zhu, Yingjian, et al.
Published: (2026)
by: Zhu, Yingjian, et al.
Published: (2026)
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
by: Chen, Lin, et al.
Published: (2025)
by: Chen, Lin, et al.
Published: (2025)
Beyond Perceptual Distances: Rethinking Disparity Assessment for Out-of-Distribution Detection with Diffusion Models
by: Fang, Kun, et al.
Published: (2024)
by: Fang, Kun, et al.
Published: (2024)
CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering
by: Hong, Yuyang, et al.
Published: (2026)
by: Hong, Yuyang, et al.
Published: (2026)
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
by: Hong, Yuyang, et al.
Published: (2025)
by: Hong, Yuyang, et al.
Published: (2025)
Hyperbolic Chamfer Distance for Point Cloud Completion and Beyond
by: Lin, Fangzhou, et al.
Published: (2024)
by: Lin, Fangzhou, et al.
Published: (2024)
DSFC-Net: A Dual-Encoder Spatial and Frequency Co-Awareness Network for Rural Road Extraction
by: Zhang, Zhengbo, et al.
Published: (2026)
by: Zhang, Zhengbo, et al.
Published: (2026)
SeaVIS: Sound-Enhanced Association for Online Audio-Visual Instance Segmentation
by: Zhu, Yingjian, et al.
Published: (2026)
by: Zhu, Yingjian, et al.
Published: (2026)
Prompt Tuning with Soft Context Sharing for Vision-Language Models
by: Ding, Kun, et al.
Published: (2022)
by: Ding, Kun, et al.
Published: (2022)
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
by: Ding, Kun, et al.
Published: (2024)
by: Ding, Kun, et al.
Published: (2024)
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
by: Yang, Qi, et al.
Published: (2024)
by: Yang, Qi, et al.
Published: (2024)
Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
by: Yang, Qi, et al.
Published: (2025)
by: Yang, Qi, et al.
Published: (2025)
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
MINIMA: Modality Invariant Image Matching
by: Ren, Jiangwei, et al.
Published: (2024)
by: Ren, Jiangwei, et al.
Published: (2024)
Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes
by: Lu, Yujie, et al.
Published: (2024)
by: Lu, Yujie, et al.
Published: (2024)
Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature
by: Shen, Lingdong, et al.
Published: (2024)
by: Shen, Lingdong, et al.
Published: (2024)
Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction
by: Jia, Bo, et al.
Published: (2025)
by: Jia, Bo, et al.
Published: (2025)
Diffusion-based Radiotherapy Dose Prediction Guided by Inter-slice Aware Structure Encoding
by: Feng, Zhenghao, et al.
Published: (2023)
by: Feng, Zhenghao, et al.
Published: (2023)
GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
by: Sun, Chengsong, et al.
Published: (2025)
by: Sun, Chengsong, et al.
Published: (2025)
SeqPE: Transformer with Sequential Position Encoding
by: Li, Huayang, et al.
Published: (2025)
by: Li, Huayang, et al.
Published: (2025)
Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos
by: Liu, Jiahe, et al.
Published: (2024)
by: Liu, Jiahe, et al.
Published: (2024)
Robust Zero Level-Set Extraction from Unsigned Distance Fields Based on Double Covering
by: Hou, Fei, et al.
Published: (2023)
by: Hou, Fei, et al.
Published: (2023)
Beyond Chamfer Distance: Granular Order-aware Evaluation Metric For Online Mapping
by: Lehocine, Chouaib Bencheikh, et al.
Published: (2026)
by: Lehocine, Chouaib Bencheikh, et al.
Published: (2026)
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
Gradient Distance Function
by: Le, Hieu, et al.
Published: (2024)
by: Le, Hieu, et al.
Published: (2024)
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
by: Deng, Wenxiao, et al.
Published: (2024)
by: Deng, Wenxiao, et al.
Published: (2024)
Similar Items
-
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
by: Yang, Qi, et al.
Published: (2025) -
Continuous Speculative Decoding for Autoregressive Image Generation
by: Wang, Zili, et al.
Published: (2024) -
Taming Modality Entanglement in Continual Audio-Visual Segmentation
by: Hong, Yuyang, et al.
Published: (2025) -
EvoVLMA: Evolutionary Vision-Language Model Adaptation
by: Ding, Kun, et al.
Published: (2025) -
A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem
by: Ding, Kun, et al.
Published: (2024)