Guardado en:
| Autores principales: | Lian, Shijie, Wu, Changti, Yang, Laurence Tianruo, Yuan, Hang, Yu, Bin, Zhang, Lei, Chen, Kai |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2509.24473 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry
por: Wu, Changti, et al.
Publicado: (2025)
por: Wu, Changti, et al.
Publicado: (2025)
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
por: Lian, Shijie, et al.
Publicado: (2026)
por: Lian, Shijie, et al.
Publicado: (2026)
TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model
por: Yu, Bin, et al.
Publicado: (2025)
por: Yu, Bin, et al.
Publicado: (2025)
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
por: Yu, Bin, et al.
Publicado: (2026)
por: Yu, Bin, et al.
Publicado: (2026)
ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning
por: Wu, Changti, et al.
Publicado: (2026)
por: Wu, Changti, et al.
Publicado: (2026)
TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian
por: Lian, Shijie, et al.
Publicado: (2025)
por: Lian, Shijie, et al.
Publicado: (2025)
PhysBrain 1.0 Technical Report
por: Lian, Shijie, et al.
Publicado: (2026)
por: Lian, Shijie, et al.
Publicado: (2026)
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
por: Lian, Shijie, et al.
Publicado: (2024)
por: Lian, Shijie, et al.
Publicado: (2024)
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
por: Liu, Yang, et al.
Publicado: (2025)
por: Liu, Yang, et al.
Publicado: (2025)
On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training
por: Wu, Xueqing, et al.
Publicado: (2026)
por: Wu, Xueqing, et al.
Publicado: (2026)
DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning
por: Yuan, Tianyuan, et al.
Publicado: (2025)
por: Yuan, Tianyuan, et al.
Publicado: (2025)
3D-Mix for VLA: A Plug-and-Play Module for Integrating VGGT-based 3D Information into Vision-Language-Action Models
por: Yu, Bin, et al.
Publicado: (2026)
por: Yu, Bin, et al.
Publicado: (2026)
Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration
por: Yue, Lu, et al.
Publicado: (2026)
por: Yue, Lu, et al.
Publicado: (2026)
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
por: Li, Hongxing, et al.
Publicado: (2025)
por: Li, Hongxing, et al.
Publicado: (2025)
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
por: Wu, Aodi, et al.
Publicado: (2025)
por: Wu, Aodi, et al.
Publicado: (2025)
Self-Evolving Spatial Reasoning in Vision Language Models via Geometric Logic Consistency
por: Liu, Junming, et al.
Publicado: (2026)
por: Liu, Junming, et al.
Publicado: (2026)
SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding
por: Zheng, Hongpei, et al.
Publicado: (2025)
por: Zheng, Hongpei, et al.
Publicado: (2025)
Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation
por: Li, Yuchen, et al.
Publicado: (2026)
por: Li, Yuchen, et al.
Publicado: (2026)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
por: Diao, Xingjian, et al.
Publicado: (2026)
por: Diao, Xingjian, et al.
Publicado: (2026)
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors
por: Ma, Chenyang, et al.
Publicado: (2024)
por: Ma, Chenyang, et al.
Publicado: (2024)
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
por: Liao, Yuan-Hong, et al.
Publicado: (2024)
por: Liao, Yuan-Hong, et al.
Publicado: (2024)
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
por: Wang, Peng, et al.
Publicado: (2024)
por: Wang, Peng, et al.
Publicado: (2024)
PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues
por: Qi, Yukun, et al.
Publicado: (2026)
por: Qi, Yukun, et al.
Publicado: (2026)
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
por: Jia, Mengdi, et al.
Publicado: (2025)
por: Jia, Mengdi, et al.
Publicado: (2025)
MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow
por: Li, Zhe, et al.
Publicado: (2024)
por: Li, Zhe, et al.
Publicado: (2024)
Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
por: Ji, Binbin, et al.
Publicado: (2025)
por: Ji, Binbin, et al.
Publicado: (2025)
Ascending the Infinite Ladder: Benchmarking Spatial Deformation Reasoning in Vision-Language Models
por: Zhang, Jiahuan, et al.
Publicado: (2025)
por: Zhang, Jiahuan, et al.
Publicado: (2025)
CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
por: Wu, Hang, et al.
Publicado: (2026)
por: Wu, Hang, et al.
Publicado: (2026)
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
por: Du, Mengfei, et al.
Publicado: (2024)
por: Du, Mengfei, et al.
Publicado: (2024)
Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning
por: Yeh, Chun-Hsiao, et al.
Publicado: (2026)
por: Yeh, Chun-Hsiao, et al.
Publicado: (2026)
SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
por: Zhang, Jian, et al.
Publicado: (2026)
por: Zhang, Jian, et al.
Publicado: (2026)
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
por: Yu, Haiyang, et al.
Publicado: (2025)
por: Yu, Haiyang, et al.
Publicado: (2025)
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
por: Zeng, Yu, et al.
Publicado: (2025)
por: Zeng, Yu, et al.
Publicado: (2025)
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
por: Li, Rongjie, et al.
Publicado: (2024)
por: Li, Rongjie, et al.
Publicado: (2024)
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
por: Yu, Bin, et al.
Publicado: (2025)
por: Yu, Bin, et al.
Publicado: (2025)
Fast Rerandomization for Balancing Covariates in Randomized Experiments: A Metropolis-Hastings Framework
por: Lu, Jiuyao, et al.
Publicado: (2026)
por: Lu, Jiuyao, et al.
Publicado: (2026)
Vision-Language Memory for Spatial Reasoning
por: Liu, Zuntao, et al.
Publicado: (2025)
por: Liu, Zuntao, et al.
Publicado: (2025)
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
por: Tang, Yihong, et al.
Publicado: (2024)
por: Tang, Yihong, et al.
Publicado: (2024)
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
por: Chen, Zezhou, et al.
Publicado: (2025)
por: Chen, Zezhou, et al.
Publicado: (2025)
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
por: Chen, Boyuan, et al.
Publicado: (2024)
por: Chen, Boyuan, et al.
Publicado: (2024)
Ejemplares similares
-
DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry
por: Wu, Changti, et al.
Publicado: (2025) -
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
por: Lian, Shijie, et al.
Publicado: (2026) -
TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model
por: Yu, Bin, et al.
Publicado: (2025) -
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
por: Yu, Bin, et al.
Publicado: (2026) -
ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning
por: Wu, Changti, et al.
Publicado: (2026)