Saved in:
| Main Authors: | Wang, Chun, Ye, Xiaojun, Pan, Xiaoran, Pan, Zihao, Wang, Haofan, Song, Yiren |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.18700 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs
by: Zheng, Yiren, et al.
Published: (2026)
by: Zheng, Yiren, et al.
Published: (2026)
SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens
by: Zhang, Xiaoyan, et al.
Published: (2026)
by: Zhang, Xiaoyan, et al.
Published: (2026)
VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers
by: Song, Yiren, et al.
Published: (2026)
by: Song, Yiren, et al.
Published: (2026)
OmniPSD: Layered PSD Generation with Diffusion Transformer
by: Liu, Cheng, et al.
Published: (2025)
by: Liu, Cheng, et al.
Published: (2025)
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
by: Tan, Huajie, et al.
Published: (2025)
by: Tan, Huajie, et al.
Published: (2025)
Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration
by: Song, Yiren, et al.
Published: (2026)
by: Song, Yiren, et al.
Published: (2026)
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering
by: Lu, Runnan, et al.
Published: (2025)
by: Lu, Runnan, et al.
Published: (2025)
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
by: Zhang, Yuxuan, et al.
Published: (2025)
by: Zhang, Yuxuan, et al.
Published: (2025)
FocusedAD: Character-centric Movie Audio Description
by: Ye, Xiaojun, et al.
Published: (2025)
by: Ye, Xiaojun, et al.
Published: (2025)
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)
by: Chen, Honghao, et al.
Published: (2025)
HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios
by: Wang, Daming, et al.
Published: (2025)
by: Wang, Daming, et al.
Published: (2025)
FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language Modeling
by: Xu, Guixian, et al.
Published: (2026)
by: Xu, Guixian, et al.
Published: (2026)
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)
by: Li, Ling, et al.
Published: (2024)
Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis
by: Gu, Zijian, et al.
Published: (2025)
by: Gu, Zijian, et al.
Published: (2025)
MERGETUNE: Continued Fine-Tuning of Vision-Language Models
by: Wang, Wenqing, et al.
Published: (2026)
by: Wang, Wenqing, et al.
Published: (2026)
Fine-Tuning Vision-Language Models for Visual Navigation Assistance
by: Li, Xiao, et al.
Published: (2025)
by: Li, Xiao, et al.
Published: (2025)
Efficient Vision-Language Pre-training by Cluster Masking
by: Wei, Zihao, et al.
Published: (2024)
by: Wei, Zihao, et al.
Published: (2024)
Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation
by: Ren, Yiming, et al.
Published: (2026)
by: Ren, Yiming, et al.
Published: (2026)
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
by: Zhai, Yuexiang, et al.
Published: (2024)
by: Zhai, Yuexiang, et al.
Published: (2024)
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)
by: Ye, Angen, et al.
Published: (2025)
Prompt Tuning with Soft Context Sharing for Vision-Language Models
by: Ding, Kun, et al.
Published: (2022)
by: Ding, Kun, et al.
Published: (2022)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models
by: Oh, Changdae, et al.
Published: (2023)
by: Oh, Changdae, et al.
Published: (2023)
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
by: Ye, Muchao, et al.
Published: (2024)
by: Ye, Muchao, et al.
Published: (2024)
VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning
by: Qi, Zhangyang, et al.
Published: (2025)
by: Qi, Zhangyang, et al.
Published: (2025)
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
by: Liu, Ting, et al.
Published: (2024)
by: Liu, Ting, et al.
Published: (2024)
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
by: Wang, Qi, et al.
Published: (2025)
by: Wang, Qi, et al.
Published: (2025)
Preserving Domain Generalization in Fine-Tuning via Joint Parameter Selection
by: Pan, Bin, et al.
Published: (2025)
by: Pan, Bin, et al.
Published: (2025)
Rethinking Fine-Tuning: Unlocking Hidden Capabilities in Vision-Language Models
by: Zhang, Mingyuan, et al.
Published: (2025)
by: Zhang, Mingyuan, et al.
Published: (2025)
EdgeFM: Efficient Edge Inference for Vision-Language Models
by: Deng, Mengling, et al.
Published: (2026)
by: Deng, Mengling, et al.
Published: (2026)
Semantic Hierarchical Prompt Tuning for Parameter-Efficient Fine-Tuning
by: Zhu, Haowei, et al.
Published: (2024)
by: Zhu, Haowei, et al.
Published: (2024)
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models
by: Zheng, Weiying, et al.
Published: (2025)
by: Zheng, Weiying, et al.
Published: (2025)
FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models
by: Huang, Chenyu, et al.
Published: (2026)
by: Huang, Chenyu, et al.
Published: (2026)
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
by: Lan, Long, et al.
Published: (2024)
by: Lan, Long, et al.
Published: (2024)
GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation
by: Guo, Shasha, et al.
Published: (2025)
by: Guo, Shasha, et al.
Published: (2025)
Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction
by: Khan, Muhammad Tayyab, et al.
Published: (2024)
by: Khan, Muhammad Tayyab, et al.
Published: (2024)
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
by: Wang, Yibin, et al.
Published: (2025)
by: Wang, Yibin, et al.
Published: (2025)
Hierarchy-Aware Fine-Tuning of Vision-Language Models
by: Li, Jiayu, et al.
Published: (2025)
by: Li, Jiayu, et al.
Published: (2025)
Image Watermarks are Removable Using Controllable Regeneration from Clean Noise
by: Liu, Yepeng, et al.
Published: (2024)
by: Liu, Yepeng, et al.
Published: (2024)
FreeFix: Boosting 3D Gaussian Splatting via Fine-Tuning-Free Diffusion Models
by: Zhou, Hongyu, et al.
Published: (2026)
by: Zhou, Hongyu, et al.
Published: (2026)
Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
by: Ding, Kun, et al.
Published: (2024)
by: Ding, Kun, et al.
Published: (2024)
Similar Items
-
Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs
by: Zheng, Yiren, et al.
Published: (2026) -
SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens
by: Zhang, Xiaoyan, et al.
Published: (2026) -
VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers
by: Song, Yiren, et al.
Published: (2026) -
OmniPSD: Layered PSD Generation with Diffusion Transformer
by: Liu, Cheng, et al.
Published: (2025) -
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
by: Tan, Huajie, et al.
Published: (2025)