Saved in:
| Main Authors: | Xiao, Jiasong, She, Yutao, Li, Kai, Sha, Yuyang, Cheng, Ziang, Tong, Ziang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23721 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models
by: Chen, Yiteng, et al.
Published: (2025)
by: Chen, Yiteng, et al.
Published: (2025)
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
by: Li, Wenbo, et al.
Published: (2025)
by: Li, Wenbo, et al.
Published: (2025)
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
by: Mehta, Vinit, et al.
Published: (2025)
by: Mehta, Vinit, et al.
Published: (2025)
A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)
by: Farazi, Mohammad, et al.
Published: (2024)
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)
by: Guo, Yijie, et al.
Published: (2025)
Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images
by: Chen, Yuangong, et al.
Published: (2026)
by: Chen, Yuangong, et al.
Published: (2026)
OmniAcc: Personalized Accessibility Assistant Using Generative AI
by: Karki, Siddhant, et al.
Published: (2025)
by: Karki, Siddhant, et al.
Published: (2025)
VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
by: Pu, Qingwen, et al.
Published: (2026)
by: Pu, Qingwen, et al.
Published: (2026)
Temporally Consistent Object 6D Pose Estimation for Robot Control
by: Zorina, Kateryna, et al.
Published: (2026)
by: Zorina, Kateryna, et al.
Published: (2026)
Decoupling Vision and Language: Codebook Anchored Visual Adaptation
by: Wu, Jason, et al.
Published: (2026)
by: Wu, Jason, et al.
Published: (2026)
SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping
by: Boudras, Thomas, et al.
Published: (2025)
by: Boudras, Thomas, et al.
Published: (2025)
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)
by: Chindemi, Giuseppe, et al.
Published: (2025)
vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding
by: Tourani, Ali, et al.
Published: (2025)
by: Tourani, Ali, et al.
Published: (2025)
ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval
by: Syed, Shahram Najam, et al.
Published: (2025)
by: Syed, Shahram Najam, et al.
Published: (2025)
Leum-VL Technical Report
by: He, Yuxuan, et al.
Published: (2026)
by: He, Yuxuan, et al.
Published: (2026)
Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting
by: Sumuk, Aarya
Published: (2026)
by: Sumuk, Aarya
Published: (2026)
Action Anticipation from SoccerNet Football Video Broadcasts
by: Dalal, Mohamad, et al.
Published: (2025)
by: Dalal, Mohamad, et al.
Published: (2025)
Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images
by: Käs, Stephanie, et al.
Published: (2025)
by: Käs, Stephanie, et al.
Published: (2025)
WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
by: Liu, Bingnan, et al.
Published: (2026)
by: Liu, Bingnan, et al.
Published: (2026)
Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)
by: Li, Changcai, et al.
Published: (2025)
NV3D: Leveraging Spatial Shape Through Normal Vector-based 3D Object Detection
by: Chaowakarn, Krittin, et al.
Published: (2025)
by: Chaowakarn, Krittin, et al.
Published: (2025)
Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification
by: Manick, Rajeev, et al.
Published: (2026)
by: Manick, Rajeev, et al.
Published: (2026)
A Light Perspective for 3D Object Detection
by: Pederiva, Marcelo Eduardo, et al.
Published: (2025)
by: Pederiva, Marcelo Eduardo, et al.
Published: (2025)
Light Future: Multimodal Action Frame Prediction via InstructPix2Pix
by: Zhong, Zesen, et al.
Published: (2025)
by: Zhong, Zesen, et al.
Published: (2025)
Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025)
by: Ji, Binbin, et al.
Published: (2025)
Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance
by: Chang, Ligang, et al.
Published: (2025)
by: Chang, Ligang, et al.
Published: (2025)
A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)
by: Jian, Song, et al.
Published: (2025)
SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection
by: Gholibeigi, Emad, et al.
Published: (2026)
by: Gholibeigi, Emad, et al.
Published: (2026)
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
by: Li, Zhuowei, et al.
Published: (2025)
by: Li, Zhuowei, et al.
Published: (2025)
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)
by: Hou, Zhangcheng, et al.
Published: (2026)
MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition
by: Hu, Yudong, et al.
Published: (2025)
by: Hu, Yudong, et al.
Published: (2025)
Single-Shot Metric Depth from Focused Plenoptic Cameras
by: Lasheras-Hernandez, Blanca, et al.
Published: (2024)
by: Lasheras-Hernandez, Blanca, et al.
Published: (2024)
Smooth regularization for efficient video recognition
by: Goldman, Gil, et al.
Published: (2025)
by: Goldman, Gil, et al.
Published: (2025)
Evaluating the Impact of Synthetic Data on Object Detection Tasks in Autonomous Driving
by: Özeren, Enes, et al.
Published: (2025)
by: Özeren, Enes, et al.
Published: (2025)
Neuromorphic Monocular Depth Estimation with Uncertainty Modeling
by: Bergkvist, Viktor, et al.
Published: (2026)
by: Bergkvist, Viktor, et al.
Published: (2026)
Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)
by: Wang, Yiming, et al.
Published: (2026)
Implementing Adaptations for Vision AutoRegressive Model
by: Shaikh, Kaif, et al.
Published: (2025)
by: Shaikh, Kaif, et al.
Published: (2025)
PhysicsNeRF: Physics-Guided 3D Reconstruction from Sparse Views
by: Barhdadi, Mohamed Rayan, et al.
Published: (2025)
by: Barhdadi, Mohamed Rayan, et al.
Published: (2025)
Similar Items
-
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025) -
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026) -
T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models
by: Chen, Yiteng, et al.
Published: (2025) -
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
by: Li, Wenbo, et al.
Published: (2025) -
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
by: Mehta, Vinit, et al.
Published: (2025)