Saved in:
| Main Authors: | Li, Bangzheng, Ni, Jianmo, Qu, Chen, Miao, Ian, Yang, Liu, Fu, Xingyu, Chen, Muhao, Cheng, Derek Zhiyuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04884 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
by: Cai, Rui, et al.
Published: (2025)
by: Cai, Rui, et al.
Published: (2025)
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
by: Xu, Nan, et al.
Published: (2024)
by: Xu, Nan, et al.
Published: (2024)
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
by: Chen, Shuang, et al.
Published: (2025)
by: Chen, Shuang, et al.
Published: (2025)
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)
by: Chen, Yi, et al.
Published: (2025)
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
MoPD: Mixture-of-Prompts Distillation for Vision-Language Models
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Calibrated Self-Rewarding Vision Language Models
by: Zhou, Yiyang, et al.
Published: (2024)
by: Zhou, Yiyang, et al.
Published: (2024)
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
Verbalized Representation Learning for Interpretable Few-Shot Generalization
by: Yang, Cheng-Fu, et al.
Published: (2024)
by: Yang, Cheng-Fu, et al.
Published: (2024)
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
by: Duan, Chengqi, et al.
Published: (2025)
by: Duan, Chengqi, et al.
Published: (2025)
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026)
by: Yang, Rui, et al.
Published: (2026)
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
by: Huang, Yixu, et al.
Published: (2026)
by: Huang, Yixu, et al.
Published: (2026)
MOFI: Learning Image Representations from Noisy Entity Annotated Images
by: Wu, Wentao, et al.
Published: (2023)
by: Wu, Wentao, et al.
Published: (2023)
REBEL: Reinforcement Learning via Regressing Relative Rewards
by: Gao, Zhaolin, et al.
Published: (2024)
by: Gao, Zhaolin, et al.
Published: (2024)
URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering
by: Teng, Ge, et al.
Published: (2024)
by: Teng, Ge, et al.
Published: (2024)
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
by: Liang, Yiqing, et al.
Published: (2025)
by: Liang, Yiqing, et al.
Published: (2025)
Generalization in Online Reinforcement Learning for Mobile Agents
by: Gu, Li, et al.
Published: (2026)
by: Gu, Li, et al.
Published: (2026)
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
by: Luo, Jun, et al.
Published: (2024)
by: Luo, Jun, et al.
Published: (2024)
Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025)
by: Chen, Weixing, et al.
Published: (2025)
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1
by: Chen, Yi, et al.
Published: (2025)
by: Chen, Yi, et al.
Published: (2025)
LatentLLM: Attention-Aware Joint Tensor Compression
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
PromptTA: Prompt-driven Text Adapter for Source-free Domain Generalization
by: Zhang, Haoran, et al.
Published: (2024)
by: Zhang, Haoran, et al.
Published: (2024)
Annotation-Free Reinforcement Learning Query Rewriting via Verifiable Search Reward
by: Cha, Sungguk, et al.
Published: (2025)
by: Cha, Sungguk, et al.
Published: (2025)
Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression
by: Du, Yao, et al.
Published: (2026)
by: Du, Yao, et al.
Published: (2026)
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
by: Wu, Zijian, et al.
Published: (2025)
by: Wu, Zijian, et al.
Published: (2025)
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
by: Wei, Lai, et al.
Published: (2025)
by: Wei, Lai, et al.
Published: (2025)
Reinforcement Learning with Generative Models for Compact Support Sets
by: Schiavone, Nico, et al.
Published: (2024)
by: Schiavone, Nico, et al.
Published: (2024)
Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
by: Zhu, Boyu, et al.
Published: (2025)
by: Zhu, Boyu, et al.
Published: (2025)
Listen Then See: Video Alignment with Speaker Attention
by: Agrawal, Aviral, et al.
Published: (2024)
by: Agrawal, Aviral, et al.
Published: (2024)
Transformer with Controlled Attention for Synchronous Motion Captioning
by: Radouane, Karim, et al.
Published: (2024)
by: Radouane, Karim, et al.
Published: (2024)
LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
by: Kowsher, Md, et al.
Published: (2026)
by: Kowsher, Md, et al.
Published: (2026)
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
by: Guo, Zirun, et al.
Published: (2024)
by: Guo, Zirun, et al.
Published: (2024)
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
by: Basile, Lorenzo, et al.
Published: (2025)
by: Basile, Lorenzo, et al.
Published: (2025)
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
by: Yamada, Yoshihiro
Published: (2025)
by: Yamada, Yoshihiro
Published: (2025)
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
by: Yin, Shukang, et al.
Published: (2024)
by: Yin, Shukang, et al.
Published: (2024)
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)
by: Wang, Baode, et al.
Published: (2025)
Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting
by: Zhuo, Linhai, et al.
Published: (2024)
by: Zhuo, Linhai, et al.
Published: (2024)
Similar Items
-
Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection
by: Li, Bangzheng, et al.
Published: (2025) -
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
by: Cai, Rui, et al.
Published: (2025) -
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
by: Xu, Nan, et al.
Published: (2024) -
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
by: Chen, Shuang, et al.
Published: (2025) -
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)