Saved in:
| Main Authors: | Li, Yunxin, Liu, Zhenyu, Li, Zitao, Zhang, Xuanyu, Xu, Zhenran, Chen, Xinyu, Shi, Haoyuan, Jiang, Shenyuan, Wang, Xintong, Wang, Jifang, Huang, Shouzheng, Zhao, Xinping, Jiang, Borui, Hong, Lanqing, Wang, Longyue, Tian, Zhuotao, Huai, Baoxing, Luo, Wenhan, Luo, Weihua, Zhang, Zheng, Hu, Baotian, Zhang, Min |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.04921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
by: Li, Yunxin, et al.
Published: (2025)
by: Li, Yunxin, et al.
Published: (2025)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
by: Xu, Zhenran, et al.
Published: (2025)
by: Xu, Zhenran, et al.
Published: (2025)
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
A Unified Agentic Framework for Evaluating Conditional Image Generation
by: Wang, Jifang, et al.
Published: (2025)
by: Wang, Jifang, et al.
Published: (2025)
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
by: Li, Yunxin, et al.
Published: (2025)
by: Li, Yunxin, et al.
Published: (2025)
MSVBench: Towards Human-Level Evaluation of Multi-Shot Video Generation
by: Shi, Haoyuan, et al.
Published: (2026)
by: Shi, Haoyuan, et al.
Published: (2026)
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
by: Liu, Zhenyu, et al.
Published: (2025)
by: Liu, Zhenyu, et al.
Published: (2025)
AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
by: Shi, Haoyuan, et al.
Published: (2025)
by: Shi, Haoyuan, et al.
Published: (2025)
VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
by: Chen, Xinyu, et al.
Published: (2025)
by: Chen, Xinyu, et al.
Published: (2025)
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
by: Xu, Zhenran, et al.
Published: (2025)
by: Xu, Zhenran, et al.
Published: (2025)
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
by: Xu, Zhenran, et al.
Published: (2025)
by: Xu, Zhenran, et al.
Published: (2025)
CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision
by: Wang, Pengcheng, et al.
Published: (2026)
by: Wang, Pengcheng, et al.
Published: (2026)
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
by: Li, Jinchao, et al.
Published: (2026)
by: Li, Jinchao, et al.
Published: (2026)
A State-Transition Framework for Efficient LLM Reasoning
by: Zhang, Liang, et al.
Published: (2026)
by: Zhang, Liang, et al.
Published: (2026)
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
by: Zhang, Meishan, et al.
Published: (2025)
by: Zhang, Meishan, et al.
Published: (2025)
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
by: Liu, Zhenyu, et al.
Published: (2025)
by: Liu, Zhenyu, et al.
Published: (2025)
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion
by: Zhao, Xinping, et al.
Published: (2024)
by: Zhao, Xinping, et al.
Published: (2024)
New Trends for Modern Machine Translation with Large Reasoning Models
by: Liu, Sinuo, et al.
Published: (2025)
by: Liu, Sinuo, et al.
Published: (2025)
VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation
by: Pan, Jingheng, et al.
Published: (2026)
by: Pan, Jingheng, et al.
Published: (2026)
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
Generative Multimodal Entity Linking
by: Shi, Senbao, et al.
Published: (2023)
by: Shi, Senbao, et al.
Published: (2023)
RaSeRec: Retrieval-Augmented Sequential Recommendation
by: Zhao, Xinping, et al.
Published: (2024)
by: Zhao, Xinping, et al.
Published: (2024)
Learning to Extract Rational Evidence via Reinforcement Learning for Retrieval-Augmented Generation
by: Zhao, Xinping, et al.
Published: (2025)
by: Zhao, Xinping, et al.
Published: (2025)
ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution
by: Huang, Shouzheng, et al.
Published: (2026)
by: Huang, Shouzheng, et al.
Published: (2026)
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering
by: Li, Yunxin, et al.
Published: (2023)
by: Li, Yunxin, et al.
Published: (2023)
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
by: Zhao, Yu, et al.
Published: (2024)
by: Zhao, Yu, et al.
Published: (2024)
LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy
by: Ruan, Zhiwen, et al.
Published: (2025)
by: Ruan, Zhiwen, et al.
Published: (2025)
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
by: Wu, Minghao, et al.
Published: (2025)
by: Wu, Minghao, et al.
Published: (2025)
Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
by: Li, Yunxin, et al.
Published: (2023)
by: Li, Yunxin, et al.
Published: (2023)
Multimodal Latent Reasoning via Hierarchical Visual Cues Injection
by: Zhang, Yiming, et al.
Published: (2026)
by: Zhang, Yiming, et al.
Published: (2026)
Less Languages, Less Tokens: An Efficient Unified Logic Cross-lingual Chain-of-Thought Reasoning Framework
by: Zhang, Chenyuan, et al.
Published: (2026)
by: Zhang, Chenyuan, et al.
Published: (2026)
Challenging Multilingual LLMs: A New Taxonomy and Benchmark for Unraveling Hallucination in Translation
by: Wu, Xinwei, et al.
Published: (2025)
by: Wu, Xinwei, et al.
Published: (2025)
Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models
by: Yin, Huifeng, et al.
Published: (2025)
by: Yin, Huifeng, et al.
Published: (2025)
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox
by: Li, Yuanyang, et al.
Published: (2026)
by: Li, Yuanyang, et al.
Published: (2026)
Difficulty-Estimated Policy Optimization
by: Zhao, Yu, et al.
Published: (2026)
by: Zhao, Yu, et al.
Published: (2026)
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
by: Jiang, Fan, et al.
Published: (2026)
by: Jiang, Fan, et al.
Published: (2026)
Similar Items
-
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
by: Li, Yunxin, et al.
Published: (2025) -
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
by: Li, Yunxin, et al.
Published: (2024) -
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
by: Xu, Zhenran, et al.
Published: (2025) -
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
by: Li, Yunxin, et al.
Published: (2024) -
A Unified Agentic Framework for Evaluating Conditional Image Generation
by: Wang, Jifang, et al.
Published: (2025)