Saved in:
| Main Authors: | Hu, Pengfei, Zhang, Zhenrong, Chang, Qikai, Liu, Shuhang, Ma, Jiefeng, Du, Jun, Zhang, Jianshu, Liu, Quan, Gao, Jianqing, Ma, Feng, Liu, Qingfeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.10222 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique
by: Liu, Shuhang, et al.
Published: (2025)
by: Liu, Shuhang, et al.
Published: (2025)
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
by: Chang, Qikai, et al.
Published: (2025)
by: Chang, Qikai, et al.
Published: (2025)
DocMamba: Efficient Document Pre-training with State Space Model
by: Hu, Pengfei, et al.
Published: (2024)
by: Hu, Pengfei, et al.
Published: (2024)
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration
by: Pan, Yicheng, et al.
Published: (2025)
by: Pan, Yicheng, et al.
Published: (2025)
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition
by: Zhang, Zhenrong, et al.
Published: (2024)
by: Zhang, Zhenrong, et al.
Published: (2024)
See then Tell: Enhancing Key Information Extraction with Vision Grounding
by: Liu, Shuhang, et al.
Published: (2024)
by: Liu, Shuhang, et al.
Published: (2024)
Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning
by: Wu, Fei, et al.
Published: (2026)
by: Wu, Fei, et al.
Published: (2026)
Skeleton and Font Generation Network for Zero-shot Chinese Character Generation
by: Xue, Mobai, et al.
Published: (2025)
by: Xue, Mobai, et al.
Published: (2025)
SCI-Reason: A Dataset with Chain-of-Thought Rationales for Complex Multimodal Reasoning in Academic Areas
by: Ma, Chenghao, et al.
Published: (2025)
by: Ma, Chenghao, et al.
Published: (2025)
Latent Swap Joint Diffusion for 2D Long-Form Latent Generation
by: Dai, Yusheng, et al.
Published: (2025)
by: Dai, Yusheng, et al.
Published: (2025)
XEmoGPT: An Explainable Multimodal Emotion Recognition Framework with Cue-Level Perception and Reasoning
by: Zhang, Hanwen, et al.
Published: (2026)
by: Zhang, Hanwen, et al.
Published: (2026)
Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models
by: Shi, Xiang, et al.
Published: (2024)
by: Shi, Xiang, et al.
Published: (2024)
Frozen LVLMs for Micro-Video Recommendation: A Systematic Study of Feature Extraction and Fusion
by: Sun, Huatuan, et al.
Published: (2025)
by: Sun, Huatuan, et al.
Published: (2025)
HDA-SELD: Hierarchical Cross-Modal Distillation with Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention
by: Liu, Moyang, et al.
Published: (2025)
by: Liu, Moyang, et al.
Published: (2025)
FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video Understanding
by: He, Xusheng, et al.
Published: (2025)
by: He, Xusheng, et al.
Published: (2025)
StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation
by: Yang, An, et al.
Published: (2025)
by: Yang, An, et al.
Published: (2025)
Music Grounding by Short Video
by: Xin, Zijie, et al.
Published: (2024)
by: Xin, Zijie, et al.
Published: (2024)
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
by: Zhang, Lei, et al.
Published: (2025)
by: Zhang, Lei, et al.
Published: (2025)
Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning
by: Zhao, Yu, et al.
Published: (2025)
by: Zhao, Yu, et al.
Published: (2025)
Images Inpainting Quality Evaluation Using Structural Features and Visual Saliency
by: Shuang Ma, et al.
Published: (2024)
by: Shuang Ma, et al.
Published: (2024)
Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis
by: Ma, Xueqi, et al.
Published: (2025)
by: Ma, Xueqi, et al.
Published: (2025)
Exploring the Role of Audio in Multimodal Misinformation Detection
by: Liu, Moyang, et al.
Published: (2024)
by: Liu, Moyang, et al.
Published: (2024)
Interpretable Multimodal Misinformation Detection with Logic Reasoning
by: Liu, Hui, et al.
Published: (2023)
by: Liu, Hui, et al.
Published: (2023)
FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
AcoustEmo: Open-Vocabulary Emotion Reasoning via Utterance-Aware Acoustic Q-Former
by: Zhang, Liyun, et al.
Published: (2026)
by: Zhang, Liyun, et al.
Published: (2026)
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
by: Du, Lingxiao, et al.
Published: (2025)
by: Du, Lingxiao, et al.
Published: (2025)
Dynamic Interaction-Aware and Causality-Disentangled Framework for Multimodal Sentiment Analysis
by: Dong, Guangyuan, et al.
Published: (2026)
by: Dong, Guangyuan, et al.
Published: (2026)
Evolutionary Multimodal Reasoning via Hierarchical Semantic Representation for Intent Recognition
by: Zhou, Qianrui, et al.
Published: (2026)
by: Zhou, Qianrui, et al.
Published: (2026)
EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot
by: Fei, Hao, et al.
Published: (2024)
by: Fei, Hao, et al.
Published: (2024)
Multimodal Emotion Recognition with Large Language Models
by: Zhang, Hongrui, et al.
Published: (2026)
by: Zhang, Hongrui, et al.
Published: (2026)
MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks
by: Yang, Xiaocui, et al.
Published: (2024)
by: Yang, Xiaocui, et al.
Published: (2024)
Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware Refinement
by: Zhang, Beibei, et al.
Published: (2025)
by: Zhang, Beibei, et al.
Published: (2025)
PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
by: Song, Shezheng, et al.
Published: (2024)
by: Song, Shezheng, et al.
Published: (2024)
PEARL: Training Socratic Tutors with Pedagogically Aligned Reinforcement Learning
by: Chang, Qikai, et al.
Published: (2026)
by: Chang, Qikai, et al.
Published: (2026)
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
by: Yan, Feng, et al.
Published: (2024)
by: Yan, Feng, et al.
Published: (2024)
MAGNeT: Multimodal Adaptive Gaussian Networks for Intent Inference in Moving Target Selection across Complex Scenarios
by: Li, Xiangxian, et al.
Published: (2025)
by: Li, Xiangxian, et al.
Published: (2025)
Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles
by: Wang, Zihan, et al.
Published: (2024)
by: Wang, Zihan, et al.
Published: (2024)
TMDC: A Two-Stage Modality Denoising and Complementation Framework for Multimodal Sentiment Analysis with Missing and Noisy Modalities
by: Zhuang, Yan, et al.
Published: (2025)
by: Zhuang, Yan, et al.
Published: (2025)
Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing
by: Zhang, Juan, et al.
Published: (2024)
by: Zhang, Juan, et al.
Published: (2024)
Similar Items
-
MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique
by: Liu, Shuhang, et al.
Published: (2025) -
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
by: Chang, Qikai, et al.
Published: (2025) -
DocMamba: Efficient Document Pre-training with State Space Model
by: Hu, Pengfei, et al.
Published: (2024) -
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration
by: Pan, Yicheng, et al.
Published: (2025) -
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition
by: Zhang, Zhenrong, et al.
Published: (2024)