:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Jiangye, Kumar, Gowri, Wang, Baoyuan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.08592
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing MLLM Spatial Understanding via Active 3D Scene Exploration for Multi-Perspective Reasoning
by: Chen, Jiahua, et al.
Published: (2026)

GeoAlign: Geometric Feature Realignment for MLLM Spatial Reasoning
by: Liu, Zhaochen, et al.
Published: (2026)

S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
by: Xu, Beining, et al.
Published: (2025)

Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration
by: Cai, Zhongyi, et al.
Published: (2025)

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs
by: Chang, Boyu, et al.
Published: (2026)

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
by: Huang, Qihan, et al.
Published: (2025)

What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models
by: Deng, Tianchen, et al.
Published: (2025)

Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting
by: Zhao, Haoyu, et al.
Published: (2024)

RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
by: Huang, Hanzhuo, et al.
Published: (2026)

FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units
by: Wang, Jian, et al.
Published: (2025)

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning
by: Jeon, Byungwoo, et al.
Published: (2026)

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
by: Wu, Diankun, et al.
Published: (2025)

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
by: Deng, Yu, et al.
Published: (2024)

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
by: Huang, Shaofei, et al.
Published: (2024)

SURPRISE3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
by: Huang, Jiaxin, et al.
Published: (2025)

SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
by: Zheng, Hongpei, et al.
Published: (2025)

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
by: Huang, Jiaxin, et al.
Published: (2025)

DropMAE: Learning Representations via Masked Autoencoders with Spatial-Attention Dropout for Temporal Matching Tasks
by: Wu, Qiangqiang, et al.
Published: (2023)

BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
by: Wen, Haiquan, et al.
Published: (2025)

MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence
by: Yin, Xingyilang, et al.
Published: (2026)

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis
by: Ruiz, Antonio, et al.
Published: (2025)

SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes
by: Liu, Tianhui, et al.
Published: (2026)

Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence
by: Zheng, Yufei, et al.
Published: (2026)

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
by: Ma, Wufei, et al.
Published: (2025)

Neural Functional Alignment Space: Brain-Referenced Representation of Artificial Neural Networks
by: Yan, Ruiyu, et al.
Published: (2026)

Internally Referenced Low-Light Enhancement
by: He, Peiyuan, et al.
Published: (2026)

Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation
by: Wang, Yanbo, et al.
Published: (2025)

Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations
by: Yuan, Zhihao, et al.
Published: (2025)

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
by: Jiang, Chaokang, et al.
Published: (2024)

CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
by: Huang, Kaiyi, et al.
Published: (2026)

TextBoost: Boosting Scene Text Fidelity in Ultra-low Bitrate Image Compression
by: Wang, Bingxin, et al.
Published: (2026)

Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation
by: Yarram, Sudhir, et al.
Published: (2024)

HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
by: Li, Keliang, et al.
Published: (2025)

Unleashing Semantic and Geometric Priors for 3D Scene Completion
by: Chen, Shiyuan, et al.
Published: (2025)

ReasonX: MLLM-Guided Intrinsic Image Decomposition
by: Dirik, Alara, et al.
Published: (2025)

Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
by: Liu, Chunxu, et al.
Published: (2025)

One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
by: Wang, Pengfei, et al.
Published: (2026)

R2G: Reasoning to Ground in 3D Scenes
by: Li, Yixuan, et al.
Published: (2024)

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion
by: Guo, Jiajie, et al.
Published: (2025)

LLaVA$^3$: Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs
by: Petit, Doriand, et al.
Published: (2025)