Saved in:
| Main Authors: | Yuan, Jiangye, Kumar, Gowri, Wang, Baoyuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.08592 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)
by: Chen, Jiahua, et al.
Published: (2026)
GeoAlign: Geometric Feature Realignment for MLLM Spatial Reasoning
by: Liu, Zhaochen, et al.
Published: (2026)
by: Liu, Zhaochen, et al.
Published: (2026)
S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
by: Xu, Beining, et al.
Published: (2025)
by: Xu, Beining, et al.
Published: (2025)
Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration
by: Cai, Zhongyi, et al.
Published: (2025)
by: Cai, Zhongyi, et al.
Published: (2025)
AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs
by: Chang, Boyu, et al.
Published: (2026)
by: Chang, Boyu, et al.
Published: (2026)
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
by: Huang, Qihan, et al.
Published: (2025)
by: Huang, Qihan, et al.
Published: (2025)
What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models
by: Deng, Tianchen, et al.
Published: (2025)
by: Deng, Tianchen, et al.
Published: (2025)
Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting
by: Zhao, Haoyu, et al.
Published: (2024)
by: Zhao, Haoyu, et al.
Published: (2024)
RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
by: Huang, Hanzhuo, et al.
Published: (2026)
by: Huang, Hanzhuo, et al.
Published: (2026)
FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units
by: Wang, Jian, et al.
Published: (2025)
by: Wang, Jian, et al.
Published: (2025)
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning
by: Jeon, Byungwoo, et al.
Published: (2026)
by: Jeon, Byungwoo, et al.
Published: (2026)
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
by: Wu, Diankun, et al.
Published: (2025)
by: Wu, Diankun, et al.
Published: (2025)
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
by: Deng, Yu, et al.
Published: (2024)
by: Deng, Yu, et al.
Published: (2024)
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
by: Huang, Shaofei, et al.
Published: (2024)
by: Huang, Shaofei, et al.
Published: (2024)
SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
by: Huang, Jiaxin, et al.
Published: (2025)
by: Huang, Jiaxin, et al.
Published: (2025)
SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
by: Zheng, Hongpei, et al.
Published: (2025)
by: Zheng, Hongpei, et al.
Published: (2025)
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
by: Huang, Jiaxin, et al.
Published: (2025)
by: Huang, Jiaxin, et al.
Published: (2025)
DropMAE: Learning Representations via Masked Autoencoders with Spatial-Attention Dropout for Temporal Matching Tasks
by: Wu, Qiangqiang, et al.
Published: (2023)
by: Wu, Qiangqiang, et al.
Published: (2023)
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
by: Wen, Haiquan, et al.
Published: (2025)
by: Wen, Haiquan, et al.
Published: (2025)
MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence
by: Yin, Xingyilang, et al.
Published: (2026)
by: Yin, Xingyilang, et al.
Published: (2026)
GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis
by: Ruiz, Antonio, et al.
Published: (2025)
by: Ruiz, Antonio, et al.
Published: (2025)
SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)
by: Liu, Tianhui, et al.
Published: (2026)
Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence
by: Zheng, Yufei, et al.
Published: (2026)
by: Zheng, Yufei, et al.
Published: (2026)
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
by: Ma, Wufei, et al.
Published: (2025)
by: Ma, Wufei, et al.
Published: (2025)
Neural Functional Alignment Space: Brain-Referenced Representation of Artificial Neural Networks
by: Yan, Ruiyu, et al.
Published: (2026)
by: Yan, Ruiyu, et al.
Published: (2026)
Internally Referenced Low-Light Enhancement
by: He, Peiyuan, et al.
Published: (2026)
by: He, Peiyuan, et al.
Published: (2026)
Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation
by: Wang, Yanbo, et al.
Published: (2025)
by: Wang, Yanbo, et al.
Published: (2025)
Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations
by: Yuan, Zhihao, et al.
Published: (2025)
by: Yuan, Zhihao, et al.
Published: (2025)
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
by: Jiang, Chaokang, et al.
Published: (2024)
by: Jiang, Chaokang, et al.
Published: (2024)
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
by: Huang, Kaiyi, et al.
Published: (2026)
by: Huang, Kaiyi, et al.
Published: (2026)
TextBoost: Boosting Scene Text Fidelity in Ultra-low Bitrate Image Compression
by: Wang, Bingxin, et al.
Published: (2026)
by: Wang, Bingxin, et al.
Published: (2026)
Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation
by: Yarram, Sudhir, et al.
Published: (2024)
by: Yarram, Sudhir, et al.
Published: (2024)
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
by: Li, Keliang, et al.
Published: (2025)
by: Li, Keliang, et al.
Published: (2025)
Unleashing Semantic and Geometric Priors for 3D Scene Completion
by: Chen, Shiyuan, et al.
Published: (2025)
by: Chen, Shiyuan, et al.
Published: (2025)
ReasonX: MLLM-Guided Intrinsic Image Decomposition
by: Dirik, Alara, et al.
Published: (2025)
by: Dirik, Alara, et al.
Published: (2025)
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
by: Liu, Chunxu, et al.
Published: (2025)
by: Liu, Chunxu, et al.
Published: (2025)
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
by: Wang, Pengfei, et al.
Published: (2026)
by: Wang, Pengfei, et al.
Published: (2026)
R2G: Reasoning to Ground in 3D Scenes
by: Li, Yixuan, et al.
Published: (2024)
by: Li, Yixuan, et al.
Published: (2024)
SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion
by: Guo, Jiajie, et al.
Published: (2025)
by: Guo, Jiajie, et al.
Published: (2025)
LLaVA$^3$: Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs
by: Petit, Doriand, et al.
Published: (2025)
by: Petit, Doriand, et al.
Published: (2025)
Similar Items
-
Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026) -
GeoAlign: Geometric Feature Realignment for MLLM Spatial Reasoning
by: Liu, Zhaochen, et al.
Published: (2026) -
S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
by: Xu, Beining, et al.
Published: (2025) -
Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration
by: Cai, Zhongyi, et al.
Published: (2025) -
AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs
by: Chang, Boyu, et al.
Published: (2026)