:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Bangzheng, Ni, Jianmo, Qu, Chen, Miao, Ian, Yang, Liu, Fu, Xingyu, Chen, Muhao, Cheng, Derek Zhiyuan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2602.04884
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Semantic-Clipping: Efficient Vision-Language Modeling with Semantic-Guidedd Visual Selection
by: Li, Bangzheng, et al.
Published: (2025)

Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
by: Cai, Rui, et al.
Published: (2025)

From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
by: Xu, Nan, et al.
Published: (2024)

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
by: Chen, Shuang, et al.
Published: (2025)

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
by: Huang, Wei, et al.
Published: (2025)

MoPD: Mixture-of-Prompts Distillation for Vision-Language Models
by: Chen, Yang, et al.
Published: (2024)

Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)

Calibrated Self-Rewarding Vision Language Models
by: Zhou, Yiyang, et al.
Published: (2024)

mDPO: Conditional Preference Optimization for Multimodal Large Language Models
by: Wang, Fei, et al.
Published: (2024)

Verbalized Representation Learning for Interpretable Few-Shot Generalization
by: Yang, Cheng-Fu, et al.
Published: (2024)

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
by: Duan, Chengqi, et al.
Published: (2025)

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026)

Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
by: Huang, Yixu, et al.
Published: (2026)

MOFI: Learning Image Representations from Noisy Entity Annotated Images
by: Wu, Wentao, et al.
Published: (2023)

REBEL: Reinforcement Learning via Regressing Relative Rewards
by: Gao, Zhaolin, et al.
Published: (2024)

URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering
by: Teng, Ge, et al.
Published: (2024)

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
by: Liang, Yiqing, et al.
Published: (2025)

Generalization in Online Reinforcement Learning for Mobile Agents
by: Gu, Li, et al.
Published: (2026)

Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
by: Luo, Jun, et al.
Published: (2024)

Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025)

Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1
by: Chen, Yi, et al.
Published: (2025)

LatentLLM: Attention-Aware Joint Tensor Compression
by: Koike-Akino, Toshiaki, et al.
Published: (2025)

PromptTA: Prompt-driven Text Adapter for Source-free Domain Generalization
by: Zhang, Haoran, et al.
Published: (2024)

Annotation-Free Reinforcement Learning Query Rewriting via Verifiable Search Reward
by: Cha, Sungguk, et al.
Published: (2025)

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression
by: Du, Yao, et al.
Published: (2026)

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
by: Wu, Zijian, et al.
Published: (2025)

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
by: Wei, Lai, et al.
Published: (2025)

Reinforcement Learning with Generative Models for Compact Support Sets
by: Schiavone, Nico, et al.
Published: (2024)

Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
by: Zhu, Boyu, et al.
Published: (2025)

Listen Then See: Video Alignment with Speaker Attention
by: Agrawal, Aviral, et al.
Published: (2024)

Transformer with Controlled Attention for Synchronous Motion Captioning
by: Radouane, Karim, et al.
Published: (2024)

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
by: Kowsher, Md, et al.
Published: (2026)

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
by: Guo, Zirun, et al.
Published: (2024)

Head Pursuit: Probing Attention Specialization in Multimodal Transformers
by: Basile, Lorenzo, et al.
Published: (2025)

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
by: Yamada, Yoshihiro
Published: (2025)

Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
by: Yin, Shukang, et al.
Published: (2024)

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)

Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting
by: Zhuo, Linhai, et al.
Published: (2024)