Saved in:
| Main Authors: | Xu, Chuou, Ji, Liya, Chen, Qifeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.19567 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026)
by: Ji, Liya, et al.
Published: (2026)
Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits
by: Zhang, Xiang, et al.
Published: (2025)
by: Zhang, Xiang, et al.
Published: (2025)
Multi-modal Knowledge Graph Generation with Semantics-enriched Prompts
by: Xu, Yajing, et al.
Published: (2025)
by: Xu, Yajing, et al.
Published: (2025)
Probing Cross-modal Information Hubs in Audio-Visual LLMs
by: Jung, Jihoo, et al.
Published: (2026)
by: Jung, Jihoo, et al.
Published: (2026)
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025)
by: Lu, Yi, et al.
Published: (2025)
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
by: Feng, Guhao, et al.
Published: (2024)
by: Feng, Guhao, et al.
Published: (2024)
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
by: He, Hulingxiao, et al.
Published: (2026)
by: He, Hulingxiao, et al.
Published: (2026)
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning
by: Wu, Linquan, et al.
Published: (2026)
by: Wu, Linquan, et al.
Published: (2026)
MMLU-Reason: Benchmarking Multi-Task Multi-modal Language Understanding and Reasoning
by: Tie, Guiyao, et al.
Published: (2025)
by: Tie, Guiyao, et al.
Published: (2025)
Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning
by: Hersche, Michael, et al.
Published: (2024)
by: Hersche, Michael, et al.
Published: (2024)
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs
by: Hua, Pu, et al.
Published: (2024)
by: Hua, Pu, et al.
Published: (2024)
Arithmetic Reasoning with LLM: Prolog Generation & Permutation
by: Yang, Xiaocheng, et al.
Published: (2024)
by: Yang, Xiaocheng, et al.
Published: (2024)
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
by: Amara, Kenza, et al.
Published: (2024)
by: Amara, Kenza, et al.
Published: (2024)
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
by: Tang, Jiaqi, et al.
Published: (2025)
by: Tang, Jiaqi, et al.
Published: (2025)
Social Debiasing for Fair Multi-modal LLMs
by: Cheng, Harry, et al.
Published: (2024)
by: Cheng, Harry, et al.
Published: (2024)
Can Multi-modal (reasoning) LLMs work as deepfake detectors?
by: Ren, Simiao, et al.
Published: (2025)
by: Ren, Simiao, et al.
Published: (2025)
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
by: Xu, Rui, et al.
Published: (2025)
by: Xu, Rui, et al.
Published: (2025)
Diffusion-Based Visual Art Creation: A Survey and New Perspectives
by: Wang, Bingyuan, et al.
Published: (2024)
by: Wang, Bingyuan, et al.
Published: (2024)
UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
by: Guo, Ziyu, et al.
Published: (2025)
by: Guo, Ziyu, et al.
Published: (2025)
ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement
by: Rao, Zhefan, et al.
Published: (2024)
by: Rao, Zhefan, et al.
Published: (2024)
VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning
by: Jiang, Zhihuan, et al.
Published: (2024)
by: Jiang, Zhihuan, et al.
Published: (2024)
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
by: Sun, Hai-Long, et al.
Published: (2025)
by: Sun, Hai-Long, et al.
Published: (2025)
Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks
by: Xu, Xingcheng, et al.
Published: (2024)
by: Xu, Xingcheng, et al.
Published: (2024)
MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models
by: Yu, Chuang, et al.
Published: (2025)
by: Yu, Chuang, et al.
Published: (2025)
Error-Driven Prompt Optimization for Arithmetic Reasoning
by: Pándy, Árpád, et al.
Published: (2025)
by: Pándy, Árpád, et al.
Published: (2025)
Self-training Language Models for Arithmetic Reasoning
by: Kadlčík, Marek, et al.
Published: (2024)
by: Kadlčík, Marek, et al.
Published: (2024)
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
by: Tian, Zeyue, et al.
Published: (2026)
by: Tian, Zeyue, et al.
Published: (2026)
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
by: Kumar, Somnath, et al.
Published: (2024)
by: Kumar, Somnath, et al.
Published: (2024)
BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs
by: Wang, Ben, et al.
Published: (2026)
by: Wang, Ben, et al.
Published: (2026)
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
by: Gupta, Himanshu, et al.
Published: (2024)
by: Gupta, Himanshu, et al.
Published: (2024)
An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs
by: Rai, Daking, et al.
Published: (2024)
by: Rai, Daking, et al.
Published: (2024)
Iterative Semantic Reasoning from Individual to Group Interests for Generative Recommendation with LLMs
by: Zhu, Xiaofei, et al.
Published: (2026)
by: Zhu, Xiaofei, et al.
Published: (2026)
CrashAgent: Crash Scenario Generation via Multi-modal Reasoning
by: Li, Miao, et al.
Published: (2025)
by: Li, Miao, et al.
Published: (2025)
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
by: Sun, Yu, et al.
Published: (2025)
by: Sun, Yu, et al.
Published: (2025)
FedMPQ: Secure and Communication-Efficient Federated Learning with Multi-codebook Product Quantization
by: Yang, Xu, et al.
Published: (2024)
by: Yang, Xu, et al.
Published: (2024)
Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning
by: Zhang, Yang, et al.
Published: (2026)
by: Zhang, Yang, et al.
Published: (2026)
Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
Similar Items
-
Instruction-based Image Editing with Planning, Reasoning, and Generation
by: Ji, Liya, et al.
Published: (2026) -
Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising
by: Wang, Yifan, et al.
Published: (2025) -
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits
by: Zhang, Xiang, et al.
Published: (2025) -
Multi-modal Knowledge Graph Generation with Semantics-enriched Prompts
by: Xu, Yajing, et al.
Published: (2025) -
Probing Cross-modal Information Hubs in Audio-Visual LLMs
by: Jung, Jihoo, et al.
Published: (2026)