:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Lian, Shijie, Wu, Changti, Yang, Laurence Tianruo, Yuan, Hang, Yu, Bin, Zhang, Lei, Chen, Kai
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
Acceso en línea:	https://arxiv.org/abs/2509.24473
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry
por: Wu, Changti, et al.
Publicado: (2025)

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
por: Lian, Shijie, et al.
Publicado: (2026)

TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model
por: Yu, Bin, et al.
Publicado: (2025)

TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
por: Yu, Bin, et al.
Publicado: (2026)

ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning
por: Wu, Changti, et al.
Publicado: (2026)

TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian
por: Lian, Shijie, et al.
Publicado: (2025)

PhysBrain 1.0 Technical Report
por: Lian, Shijie, et al.
Publicado: (2026)

Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
por: Lian, Shijie, et al.
Publicado: (2024)

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
por: Liu, Yang, et al.
Publicado: (2025)

On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training
por: Wu, Xueqing, et al.
Publicado: (2026)

DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
por: Yuan, Tianyuan, et al.
Publicado: (2025)

3D-Mix for VLA: A Plug-and-Play Module for Integrating VGGT-based 3D Information into Vision-Language-Action Models
por: Yu, Bin, et al.
Publicado: (2026)

Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration
por: Yue, Lu, et al.
Publicado: (2026)

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
por: Li, Hongxing, et al.
Publicado: (2025)

Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
por: Wu, Aodi, et al.
Publicado: (2025)

Self-Evolving Spatial Reasoning in Vision Language Models via Geometric Logic Consistency
por: Liu, Junming, et al.
Publicado: (2026)

SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
por: Zheng, Hongpei, et al.
Publicado: (2025)

Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation
por: Li, Yuchen, et al.
Publicado: (2026)

Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
por: Diao, Xingjian, et al.
Publicado: (2026)

SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors
por: Ma, Chenyang, et al.
Publicado: (2024)

Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
por: Liao, Yuan-Hong, et al.
Publicado: (2024)

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
por: Wang, Peng, et al.
Publicado: (2024)

PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues
por: Qi, Yukun, et al.
Publicado: (2026)

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
por: Jia, Mengdi, et al.
Publicado: (2025)

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow
por: Li, Zhe, et al.
Publicado: (2024)

Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
por: Ji, Binbin, et al.
Publicado: (2025)

Ascending the Infinite Ladder: Benchmarking Spatial Deformation Reasoning in Vision-Language Models
por: Zhang, Jiahuan, et al.
Publicado: (2025)

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
por: Wu, Hang, et al.
Publicado: (2026)

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
por: Du, Mengfei, et al.
Publicado: (2024)

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning
por: Yeh, Chun-Hsiao, et al.
Publicado: (2026)

SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
por: Zhang, Jian, et al.
Publicado: (2026)

UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
por: Yu, Haiyang, et al.
Publicado: (2025)

Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
por: Zeng, Yu, et al.
Publicado: (2025)

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
por: Li, Rongjie, et al.
Publicado: (2024)

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
por: Yu, Bin, et al.
Publicado: (2025)

Fast Rerandomization for Balancing Covariates in Randomized Experiments: A Metropolis-Hastings Framework
por: Lu, Jiuyao, et al.
Publicado: (2026)

Vision-Language Memory for Spatial Reasoning
por: Liu, Zuntao, et al.
Publicado: (2025)

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
por: Tang, Yihong, et al.
Publicado: (2024)

A Large Vision-Language Model based Environment Perception System for Visually Impaired People
por: Chen, Zezhou, et al.
Publicado: (2025)

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
por: Chen, Boyuan, et al.
Publicado: (2024)