Saved in:
| Main Authors: | Zhang, Shimin, Chen, Xianwei, Shen, Yufan, Ye, Ziyuan, Wu, Jibin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.07558 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
by: Yang, Qi, et al.
Published: (2025)
by: Yang, Qi, et al.
Published: (2025)
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)
by: Li, Ling, et al.
Published: (2024)
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model
by: Jin, Jiachun, et al.
Published: (2026)
by: Jin, Jiachun, et al.
Published: (2026)
DeepLatent: Think with Images via Parallel Latent Visual Reasoning
by: Lu, Dongchen, et al.
Published: (2026)
by: Lu, Dongchen, et al.
Published: (2026)
LanteRn: Latent Visual Structured Reasoning
by: Viveiros, André G., et al.
Published: (2026)
by: Viveiros, André G., et al.
Published: (2026)
FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models
by: Huang, Chenyu, et al.
Published: (2026)
by: Huang, Chenyu, et al.
Published: (2026)
ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
by: Tian, Jiaxu, et al.
Published: (2025)
by: Tian, Jiaxu, et al.
Published: (2025)
Text-to-Scene with Large Reasoning Models
by: Berdoz, Frédéric, et al.
Published: (2025)
by: Berdoz, Frédéric, et al.
Published: (2025)
LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models
by: Sun, Mengyu, et al.
Published: (2026)
by: Sun, Mengyu, et al.
Published: (2026)
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
by: Chu, Xu, et al.
Published: (2025)
by: Chu, Xu, et al.
Published: (2025)
Reason, Retrieve, Re-rank: A Zero-Shot Reasoning-Aware Framework for Composed Video Retrieval
by: Alavi, Ali
Published: (2026)
by: Alavi, Ali
Published: (2026)
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)
by: Li, Lingxiao, et al.
Published: (2025)
LaVR: Scene Latent Conditioned Generative Video Trajectory Re-Rendering using Large 4D Reconstruction Models
by: Xie, Mingyang, et al.
Published: (2026)
by: Xie, Mingyang, et al.
Published: (2026)
LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
by: Zhu, Mengdan, et al.
Published: (2024)
by: Zhu, Mengdan, et al.
Published: (2024)
CoRe3D: Collaborative Reasoning as a Foundation for 3D Intelligence
by: Yu, Tianjiao, et al.
Published: (2025)
by: Yu, Tianjiao, et al.
Published: (2025)
Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model
by: Chen, Yufan, et al.
Published: (2025)
by: Chen, Yufan, et al.
Published: (2025)
Towards Sparse Video Understanding and Reasoning
by: Xu, Chenwei, et al.
Published: (2026)
by: Xu, Chenwei, et al.
Published: (2026)
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
by: Zheng, Xiaoji, et al.
Published: (2025)
by: Zheng, Xiaoji, et al.
Published: (2025)
ShaLa: Multimodal Shared Latent Space Modelling
by: Cui, Jiali, et al.
Published: (2025)
by: Cui, Jiali, et al.
Published: (2025)
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
by: Wang, Junchi, et al.
Published: (2024)
by: Wang, Junchi, et al.
Published: (2024)
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
by: Min, Juhong, et al.
Published: (2024)
by: Min, Juhong, et al.
Published: (2024)
DragGANSpace: Latent Space Exploration and Control for GANs
by: Odendaal, Kirsten, et al.
Published: (2025)
by: Odendaal, Kirsten, et al.
Published: (2025)
Spatial Reasoning with Denoising Models
by: Wewer, Christopher, et al.
Published: (2025)
by: Wewer, Christopher, et al.
Published: (2025)
Sum-of-Checks: Structured Reasoning for Surgical Safety with Large Vision-Language Models
by: You, Weiqiu, et al.
Published: (2026)
by: You, Weiqiu, et al.
Published: (2026)
What's Holding Back Latent Visual Reasoning?
by: Viveiros, André G., et al.
Published: (2026)
by: Viveiros, André G., et al.
Published: (2026)
MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery
by: Gao, Ziyuan
Published: (2026)
by: Gao, Ziyuan
Published: (2026)
ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment
by: Wang, Xinyi, et al.
Published: (2024)
by: Wang, Xinyi, et al.
Published: (2024)
Gradient-Guided Exploration of Generative Model's Latent Space for Controlled Iris Image Augmentations
by: Mitcheff, Mahsa, et al.
Published: (2025)
by: Mitcheff, Mahsa, et al.
Published: (2025)
LaRe: Latent Refocusing for Multimodal Reasoning
by: Ma, Jizheng, et al.
Published: (2025)
by: Ma, Jizheng, et al.
Published: (2025)
Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models
by: Chen, Ziyuan, et al.
Published: (2026)
by: Chen, Ziyuan, et al.
Published: (2026)
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
by: Huang, Chi-Pin, et al.
Published: (2025)
by: Huang, Chi-Pin, et al.
Published: (2025)
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
by: Shen, Junhao, et al.
Published: (2025)
by: Shen, Junhao, et al.
Published: (2025)
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
by: Su, Hung-Ting, et al.
Published: (2024)
by: Su, Hung-Ting, et al.
Published: (2024)
Scaling Supervised Local Learning with Augmented Auxiliary Networks
by: Ma, Chenxiang, et al.
Published: (2024)
by: Ma, Chenxiang, et al.
Published: (2024)
DC-Merge: Improving Model Merging with Directional Consistency
by: Zhang, Han-Chen, et al.
Published: (2026)
by: Zhang, Han-Chen, et al.
Published: (2026)
Interleaving Reasoning for Better Text-to-Image Generation
by: Huang, Wenxuan, et al.
Published: (2025)
by: Huang, Wenxuan, et al.
Published: (2025)
NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering
by: Murphy, Alexander, et al.
Published: (2025)
by: Murphy, Alexander, et al.
Published: (2025)
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
by: Bigverdi, Mahtab, et al.
Published: (2024)
by: Bigverdi, Mahtab, et al.
Published: (2024)
InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem
by: Hong, Yeobin, et al.
Published: (2025)
by: Hong, Yeobin, et al.
Published: (2025)
Similar Items
-
Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
by: Yang, Qi, et al.
Published: (2025) -
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
by: Yang, Shuo, et al.
Published: (2025) -
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024) -
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model
by: Jin, Jiachun, et al.
Published: (2026) -
DeepLatent: Think with Images via Parallel Latent Visual Reasoning
by: Lu, Dongchen, et al.
Published: (2026)