Saved in:
| Main Authors: | Yan, Dawei, Li, Yang, Chen, Qing-Guo, Luo, Weihua, Wang, Peng, Zhang, Haokui, Shen, Chunhua |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.18533 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
by: Yan, Dawei, et al.
Published: (2024)
by: Yan, Dawei, et al.
Published: (2024)
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
by: Tao, Lijie, et al.
Published: (2024)
by: Tao, Lijie, et al.
Published: (2024)
Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction
by: Chen, Xingwu, et al.
Published: (2026)
by: Chen, Xingwu, et al.
Published: (2026)
MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues
by: Liu, Zheyuan, et al.
Published: (2026)
by: Liu, Zheyuan, et al.
Published: (2026)
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)
by: Lu, Shiyin, et al.
Published: (2024)
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
by: Chen, Sijia, et al.
Published: (2024)
by: Chen, Sijia, et al.
Published: (2024)
TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering
by: Kwok, Tung Sum Thomas, et al.
Published: (2026)
by: Kwok, Tung Sum Thomas, et al.
Published: (2026)
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
by: Bigverdi, Mahtab, et al.
Published: (2024)
by: Bigverdi, Mahtab, et al.
Published: (2024)
Multimodal Tabular Reasoning with Privileged Structured Information
by: Jiang, Jun-Peng, et al.
Published: (2025)
by: Jiang, Jun-Peng, et al.
Published: (2025)
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
by: Yan, Yibo, et al.
Published: (2025)
by: Yan, Yibo, et al.
Published: (2025)
Contextualization Distillation from Large Language Model for Knowledge Graph Completion
by: Li, Dawei, et al.
Published: (2024)
by: Li, Dawei, et al.
Published: (2024)
FunReason-MT Technical Report: Advanced Data Synthesis Solution for Real-world Multi-Turn Tool-use
by: Xu, Zengzhuang, et al.
Published: (2025)
by: Xu, Zengzhuang, et al.
Published: (2025)
METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models
by: Li, Pengfeng, et al.
Published: (2026)
by: Li, Pengfeng, et al.
Published: (2026)
MMIF-AMIN: Adaptive Loss-Driven Multi-Scale Invertible Dense Network for Multimodal Medical Image Fusion
by: Luo, Tao, et al.
Published: (2025)
by: Luo, Tao, et al.
Published: (2025)
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models
by: Li, Yubo, et al.
Published: (2025)
by: Li, Yubo, et al.
Published: (2025)
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
by: Yang, Yiting, et al.
Published: (2025)
by: Yang, Yiting, et al.
Published: (2025)
MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models
by: Yan, Siyu, et al.
Published: (2025)
by: Yan, Siyu, et al.
Published: (2025)
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models
by: Wu, Jiangxu, et al.
Published: (2025)
by: Wu, Jiangxu, et al.
Published: (2025)
Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models
by: Jung, Woojun, et al.
Published: (2025)
by: Jung, Woojun, et al.
Published: (2025)
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
by: Chen, Cong, et al.
Published: (2025)
by: Chen, Cong, et al.
Published: (2025)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models
by: Ying, Zonghao, et al.
Published: (2025)
by: Ying, Zonghao, et al.
Published: (2025)
LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking
by: Yan, Faren, et al.
Published: (2024)
by: Yan, Faren, et al.
Published: (2024)
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
by: Wang, Kangrui, et al.
Published: (2025)
by: Wang, Kangrui, et al.
Published: (2025)
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
by: Hu, Wenbo, et al.
Published: (2026)
by: Hu, Wenbo, et al.
Published: (2026)
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
by: Chen, Zui, et al.
Published: (2025)
by: Chen, Zui, et al.
Published: (2025)
OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models
by: Coscia, Adam, et al.
Published: (2025)
by: Coscia, Adam, et al.
Published: (2025)
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
by: He, Chen, et al.
Published: (2025)
by: He, Chen, et al.
Published: (2025)
Optimizing Temperature for Language Models with Multi-Sample Inference
by: Du, Weihua, et al.
Published: (2025)
by: Du, Weihua, et al.
Published: (2025)
DeepVIS: Bridging Natural Language and Data Visualization Through Step-wise Reasoning
by: Shuai, Zhihao, et al.
Published: (2025)
by: Shuai, Zhihao, et al.
Published: (2025)
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
by: Zhou, Chenyue, et al.
Published: (2025)
by: Zhou, Chenyue, et al.
Published: (2025)
Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models
by: Qian, Zhe, et al.
Published: (2026)
by: Qian, Zhe, et al.
Published: (2026)
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
by: Peng, Shuai, et al.
Published: (2024)
by: Peng, Shuai, et al.
Published: (2024)
DLLMQuant: Quantizing Diffusion-based Large Language Models
by: Xu, Chen, et al.
Published: (2025)
by: Xu, Chen, et al.
Published: (2025)
Enhancing Advanced Visual Reasoning Ability of Large Language Models
by: Li, Zhiyuan, et al.
Published: (2024)
by: Li, Zhiyuan, et al.
Published: (2024)
MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning
by: Wang, Eileen, et al.
Published: (2026)
by: Wang, Eileen, et al.
Published: (2026)
VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation
by: Pan, Jingheng, et al.
Published: (2026)
by: Pan, Jingheng, et al.
Published: (2026)
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models
by: Wang, Lu, et al.
Published: (2026)
by: Wang, Lu, et al.
Published: (2026)
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
by: Wang, Peijie, et al.
Published: (2025)
by: Wang, Peijie, et al.
Published: (2025)
Contextual Object Detection with Multimodal Large Language Models
by: Zang, Yuhang, et al.
Published: (2023)
by: Zang, Yuhang, et al.
Published: (2023)
Similar Items
-
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
by: Yan, Dawei, et al.
Published: (2024) -
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
by: Tao, Lijie, et al.
Published: (2024) -
Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction
by: Chen, Xingwu, et al.
Published: (2026) -
MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues
by: Liu, Zheyuan, et al.
Published: (2026) -
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
by: Lu, Shiyin, et al.
Published: (2024)