:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiao, Jiasong, She, Yutao, Li, Kai, Sha, Yuyang, Cheng, Ziang, Tong, Ziang
Format:	Preprint
Published:	2026
Subjects:	Robotics Computer Vision and Pattern Recognition I.2.9; I.2.10; I.4.8; J.2
Online Access:	https://arxiv.org/abs/2602.23721
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)

T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models
by: Chen, Yiteng, et al.
Published: (2025)

AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
by: Li, Wenbo, et al.
Published: (2025)

Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
by: Mehta, Vinit, et al.
Published: (2025)

A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images
by: Chen, Yuangong, et al.
Published: (2026)

OmniAcc: Personalized Accessibility Assistant Using Generative AI
by: Karki, Siddhant, et al.
Published: (2025)

VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
by: Pu, Qingwen, et al.
Published: (2026)

Temporally Consistent Object 6D Pose Estimation for Robot Control
by: Zorina, Kateryna, et al.
Published: (2026)

Decoupling Vision and Language: Codebook Anchored Visual Adaptation
by: Wu, Jason, et al.
Published: (2026)

SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping
by: Boudras, Thomas, et al.
Published: (2025)

From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding
by: Tourani, Ali, et al.
Published: (2025)

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval
by: Syed, Shahram Najam, et al.
Published: (2025)

Leum-VL Technical Report
by: He, Yuxuan, et al.
Published: (2026)

Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
by: Sumuk, Aarya
Published: (2026)

Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images
by: Käs, Stephanie, et al.
Published: (2025)

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
by: Liu, Bingnan, et al.
Published: (2026)

Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)

NV3D: Leveraging Spatial Shape Through Normal Vector-based 3D Object Detection
by: Chaowakarn, Krittin, et al.
Published: (2025)

Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification
by: Manick, Rajeev, et al.
Published: (2026)

A Light Perspective for 3D Object Detection
by: Pederiva, Marcelo Eduardo, et al.
Published: (2025)

Light Future: Multimodal Action Frame Prediction via InstructPix2Pix
by: Zhong, Zesen, et al.
Published: (2025)

Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025)

Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
by: Chang, Ligang, et al.
Published: (2025)

A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)

SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
by: Gholibeigi, Emad, et al.
Published: (2026)

Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
by: Li, Zhuowei, et al.
Published: (2025)

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)

MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition
by: Hu, Yudong, et al.
Published: (2025)

Single-Shot Metric Depth from Focused Plenoptic Cameras
by: Lasheras-Hernandez, Blanca, et al.
Published: (2024)

Smooth regularization for efficient video recognition
by: Goldman, Gil, et al.
Published: (2025)

Evaluating the Impact of Synthetic Data on Object Detection Tasks in Autonomous Driving
by: Özeren, Enes, et al.
Published: (2025)

Neuromorphic Monocular Depth Estimation with Uncertainty Modeling
by: Bergkvist, Viktor, et al.
Published: (2026)

Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)

Implementing Adaptations for Vision AutoRegressive Model
by: Shaikh, Kaif, et al.
Published: (2025)

PhysicsNeRF: Physics-Guided 3D Reconstruction from Sparse Views
by: Barhdadi, Mohamed Rayan, et al.
Published: (2025)