Saved in:
| Main Authors: | Chen, Andong, Zhu, Wenxin, Ding, Qiuyu, Song, Yuchen, Yang, Muyun, Zhao, Tiejun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02453 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
by: Song, Yuchen, et al.
Published: (2025)
by: Song, Yuchen, et al.
Published: (2025)
Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis
by: Chen, Andong, et al.
Published: (2025)
by: Chen, Andong, et al.
Published: (2025)
DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms
by: Chen, Andong, et al.
Published: (2024)
by: Chen, Andong, et al.
Published: (2024)
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
by: Zhang, Deyue, et al.
Published: (2025)
by: Zhang, Deyue, et al.
Published: (2025)
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)
by: Zhu, Wenxin, et al.
Published: (2025)
Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation
by: Chen, Andong, et al.
Published: (2024)
by: Chen, Andong, et al.
Published: (2024)
LLM-based Translation Inference with Iterative Bilingual Understanding
by: Chen, Andong, et al.
Published: (2024)
by: Chen, Andong, et al.
Published: (2024)
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving
by: Chen, Andong, et al.
Published: (2024)
by: Chen, Andong, et al.
Published: (2024)
Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation
by: Wang, Sirui, et al.
Published: (2025)
by: Wang, Sirui, et al.
Published: (2025)
Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with GuidelineLLM
by: Zhang, Shaoqing, et al.
Published: (2024)
by: Zhang, Shaoqing, et al.
Published: (2024)
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
Dynamic Planning for LLM-based Graphical User Interface Automation
by: Zhang, Shaoqing, et al.
Published: (2024)
by: Zhang, Shaoqing, et al.
Published: (2024)
MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training
by: Huang, Hui, et al.
Published: (2025)
by: Huang, Hui, et al.
Published: (2025)
Collaborative Comic Generation: Integrating Visual Narrative Theories with AI Models for Enhanced Creativity
by: Chen, Yi-Chun, et al.
Published: (2024)
by: Chen, Yi-Chun, et al.
Published: (2024)
A Customizable Generator for Comic-Style Visual Narrative
by: Chen, Yi-Chun, et al.
Published: (2023)
by: Chen, Yi-Chun, et al.
Published: (2023)
StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues
by: Ruan, Zanxi, et al.
Published: (2026)
by: Ruan, Zanxi, et al.
Published: (2026)
AI Readiness in Healthcare through Storytelling XAI
by: Dubey, Akshat, et al.
Published: (2024)
by: Dubey, Akshat, et al.
Published: (2024)
REVISION:Reflective Intent Mining and Online Reasoning Auxiliary for E-commerce Visual Search System Optimization
by: Tang, Yiwen, et al.
Published: (2025)
by: Tang, Yiwen, et al.
Published: (2025)
Towards Faithful Reasoning in Comics for Small MLLMs
by: Feng, Chengcheng, et al.
Published: (2026)
by: Feng, Chengcheng, et al.
Published: (2026)
Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations
by: Song, Zhao, et al.
Published: (2025)
by: Song, Zhao, et al.
Published: (2025)
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
by: Zheng, Haojie, et al.
Published: (2024)
by: Zheng, Haojie, et al.
Published: (2024)
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)
by: Deng, Andong, et al.
Published: (2025)
Visual Attention Reasoning via Hierarchical Search and Self-Verification
by: Cai, Wei, et al.
Published: (2025)
by: Cai, Wei, et al.
Published: (2025)
ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization
by: Li, Sunzhu, et al.
Published: (2025)
by: Li, Sunzhu, et al.
Published: (2025)
Multimodal Large Language Models for Bioimage Analysis
by: Zhang, Shanghang, et al.
Published: (2024)
by: Zhang, Shanghang, et al.
Published: (2024)
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents
by: Sun, Yuwei, et al.
Published: (2026)
by: Sun, Yuwei, et al.
Published: (2026)
When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
by: Zhang, Xiaoyun, et al.
Published: (2025)
by: Zhang, Xiaoyun, et al.
Published: (2025)
Scaling Spatial Reasoning in MLLMs through Programmatic Data Synthesis
by: Helu, Zhi, et al.
Published: (2025)
by: Helu, Zhi, et al.
Published: (2025)
Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
by: Yang, Dongchao, et al.
Published: (2025)
by: Yang, Dongchao, et al.
Published: (2025)
Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning
by: Zhang, Yang, et al.
Published: (2026)
by: Zhang, Yang, et al.
Published: (2026)
AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning
by: Xiang, Kun, et al.
Published: (2024)
by: Xiang, Kun, et al.
Published: (2024)
DeepThink3D: Enhancing Large Language Models with Programmatic Reasoning in Complex 3D Situated Reasoning Tasks
by: Song, Jiayi, et al.
Published: (2025)
by: Song, Jiayi, et al.
Published: (2025)
VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding
by: Ding, Yizhuo, et al.
Published: (2025)
by: Ding, Yizhuo, et al.
Published: (2025)
Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
by: Chen, Kaitao, et al.
Published: (2025)
by: Chen, Kaitao, et al.
Published: (2025)
Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
by: Ryan, Yuriel, et al.
Published: (2025)
by: Ryan, Yuriel, et al.
Published: (2025)
Reasoning Before Diagnosis: Physician-Inspired Structured Thinking for ECG Classification
by: Wu, Yang, et al.
Published: (2026)
by: Wu, Yang, et al.
Published: (2026)
An Embodied Companion for Visual Storytelling
by: Tresset, Patrick, et al.
Published: (2026)
by: Tresset, Patrick, et al.
Published: (2026)
Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models
by: Kim, Keuntae, et al.
Published: (2026)
by: Kim, Keuntae, et al.
Published: (2026)
Think in Blocks: Adaptive Reasoning from Direct Response to Deep Reasoning
by: Zhu, Yekun, et al.
Published: (2025)
by: Zhu, Yekun, et al.
Published: (2025)
Similar Items
-
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
by: Song, Yuchen, et al.
Published: (2025) -
Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis
by: Chen, Andong, et al.
Published: (2025) -
DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms
by: Chen, Andong, et al.
Published: (2024) -
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling
by: Zhang, Deyue, et al.
Published: (2025) -
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)