Saved in:
| Main Authors: | Fang, Yiyang, Huang, Wenke, Fu, Pei, Yang, Yihao, Su, Kehua, Luo, Zhenbo, Luan, Jian, Ye, Mang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23802 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
by: Liang, Jian, et al.
Published: (2025)
by: Liang, Jian, et al.
Published: (2025)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning
by: Huang, Wenke, et al.
Published: (2024)
by: Huang, Wenke, et al.
Published: (2024)
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
by: Chen, Yuhang, et al.
Published: (2024)
by: Chen, Yuhang, et al.
Published: (2024)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models
by: Yang, Qu, et al.
Published: (2024)
by: Yang, Qu, et al.
Published: (2024)
PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues
by: Qi, Yukun, et al.
Published: (2026)
by: Qi, Yukun, et al.
Published: (2026)
Federated Graph Semantic and Structural Learning
by: Huang, Wenke, et al.
Published: (2024)
by: Huang, Wenke, et al.
Published: (2024)
FedSSP: Federated Graph Learning with Spectral Knowledge and Personalized Preference
by: Tan, Zihan, et al.
Published: (2024)
by: Tan, Zihan, et al.
Published: (2024)
EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
by: Liu, Jingwen, et al.
Published: (2025)
by: Liu, Jingwen, et al.
Published: (2025)
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
by: Li, Jiaze, et al.
Published: (2025)
by: Li, Jiaze, et al.
Published: (2025)
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
by: Huang, Wenke, et al.
Published: (2025)
by: Huang, Wenke, et al.
Published: (2025)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations
by: Ye, Mang, et al.
Published: (2025)
by: Ye, Mang, et al.
Published: (2025)
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
by: Zhu, Linghao, et al.
Published: (2025)
by: Zhu, Linghao, et al.
Published: (2025)
Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause
by: Nguyen, Mia Huong, et al.
Published: (2024)
by: Nguyen, Mia Huong, et al.
Published: (2024)
EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
by: Li, Pengcheng, et al.
Published: (2025)
by: Li, Pengcheng, et al.
Published: (2025)
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
by: Rong, Xuankun, et al.
Published: (2025)
by: Rong, Xuankun, et al.
Published: (2025)
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
by: Zhou, Dingkun, et al.
Published: (2025)
by: Zhou, Dingkun, et al.
Published: (2025)
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)
by: Zhang, Yiru, et al.
Published: (2025)
S2FGL: Spatial Spectral Federated Graph Learning
by: Tan, Zihan, et al.
Published: (2025)
by: Tan, Zihan, et al.
Published: (2025)
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation
by: Liang, Jian, et al.
Published: (2025)
by: Liang, Jian, et al.
Published: (2025)
EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
by: Shi, Jiacheng, et al.
Published: (2025)
by: Shi, Jiacheng, et al.
Published: (2025)
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
nEMO: Dataset of Emotional Speech in Polish
by: Christop, Iwona
Published: (2024)
by: Christop, Iwona
Published: (2024)
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
by: Tan, Wenhui, et al.
Published: (2025)
by: Tan, Wenhui, et al.
Published: (2025)
Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation
by: Li, Jiaze, et al.
Published: (2026)
by: Li, Jiaze, et al.
Published: (2026)
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
by: Xing, Bohao, et al.
Published: (2024)
by: Xing, Bohao, et al.
Published: (2024)
ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding
by: Chen, Kehua
Published: (2025)
by: Chen, Kehua
Published: (2025)
An Empirical Study of Federated Prompt Learning for Vision Language Model
by: Wang, Zhihao, et al.
Published: (2025)
by: Wang, Zhihao, et al.
Published: (2025)
R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning
by: Jiang, Zhizheng, et al.
Published: (2026)
by: Jiang, Zhizheng, et al.
Published: (2026)
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025)
by: Jin, Renren, et al.
Published: (2025)
Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models
by: Tan, Wenhui, et al.
Published: (2026)
by: Tan, Wenhui, et al.
Published: (2026)
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
by: Xu, Longwei, et al.
Published: (2026)
by: Xu, Longwei, et al.
Published: (2026)
VOW: Verifiable and Oblivious Watermark Detection for Large Language Models
by: Luan, Xiaokun, et al.
Published: (2026)
by: Luan, Xiaokun, et al.
Published: (2026)
Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation
by: Li, Xuetao, et al.
Published: (2026)
by: Li, Xuetao, et al.
Published: (2026)
R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning
by: Zhang, Zirui, et al.
Published: (2026)
by: Zhang, Zirui, et al.
Published: (2026)
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning
by: Sun, Haoqin, et al.
Published: (2025)
by: Sun, Haoqin, et al.
Published: (2025)
Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA
by: Zheng, Yuanlei, et al.
Published: (2026)
by: Zheng, Yuanlei, et al.
Published: (2026)
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
by: Wang, Ye, et al.
Published: (2025)
by: Wang, Ye, et al.
Published: (2025)
Similar Items
-
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
by: Liang, Jian, et al.
Published: (2025) -
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning
by: Huang, Wenke, et al.
Published: (2024) -
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
by: Chen, Yuhang, et al.
Published: (2024) -
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models
by: Yang, Qu, et al.
Published: (2024) -
PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues
by: Qi, Yukun, et al.
Published: (2026)