:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ji, Yuyang, Shen, Yixuan, Zhu, Shengjie, Kong, Yu, Liu, Feng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.26938
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

BioGait-VLM: A Tri-Modal Vision-Language-Biomechanics Framework for Interpretable Clinical Gait Assessment
by: Chen, Erdong, et al.
Published: (2026)

Non-Colliding Biometric Identities for Digital Entities: Geometry, Capacity, and Million-Scale Virtual Identity Provisioning
by: Ji, Yuyang, et al.
Published: (2026)

IDSelect: A RL-Based Cost-Aware Selection Agent for Video-based Multi-Modal Person Recognition
by: Ji, Yuyang, et al.
Published: (2026)

RePose: A Real-Time 3D Human Pose Estimation and Biomechanical Analysis Framework for Rehabilitation
by: Xue, Junxiao, et al.
Published: (2026)

Grounded 3D-Aware Spatial Vision-Language Modeling
by: Cheng, An-Chieh, et al.
Published: (2026)

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
by: Lee, Daeun, et al.
Published: (2026)

From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing
by: Sun, Xintian, et al.
Published: (2024)

Zero-Shot 3D Visual Grounding from Vision-Language Models
by: Li, Rong, et al.
Published: (2025)

AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
by: Zhu, Yuhan, et al.
Published: (2024)

From Panels to Prose: Generating Literary Narratives from Comics
by: Sachdeva, Ragav, et al.
Published: (2025)

GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
by: Rao, Jiyong, et al.
Published: (2026)

BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos
by: Koleini, Farnoosh, et al.
Published: (2025)

MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
by: Xie, Pengfei, et al.
Published: (2024)

ChatPose: Chatting about 3D Human Pose
by: Feng, Yao, et al.
Published: (2023)

SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery
by: Li, Da, et al.
Published: (2025)

R2G: Reasoning to Ground in 3D Scenes
by: Li, Yixuan, et al.
Published: (2024)

CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
by: Lin, Xiao, et al.
Published: (2024)

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation
by: Zeng, Xianzhou, et al.
Published: (2024)

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision
by: Cheng, Xiaoya, et al.
Published: (2026)

Rethinking Pose Refinement in 3D Gaussian Splatting under Pose Prior and Geometric Uncertainty
by: Kong, Mangyu, et al.
Published: (2026)

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
by: Wang, Yuxin, et al.
Published: (2025)

Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition
by: Babey, Nicholas, et al.
Published: (2025)

DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification
by: Shen, Sitian, et al.
Published: (2023)

Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates
by: Zhu, Shengjie, et al.
Published: (2026)

From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models
by: Pulli, Tessa, et al.
Published: (2024)

From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
by: Wang, Xilin, et al.
Published: (2024)

OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics
by: Gozlan, Yoni, et al.
Published: (2024)

From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans
by: Keller, Marilyn, et al.
Published: (2025)

HeRO: Hierarchical 3D Semantic Representation for Pose-aware Object Manipulation
by: Xu, Chongyang, et al.
Published: (2026)

HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task
by: Tian, Yu, et al.
Published: (2024)

TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
by: Li, Yuan-Ming, et al.
Published: (2024)

PoseFix: Correcting 3D Human Poses with Natural Language
by: Delmas, Ginger, et al.
Published: (2023)

PoseScript: Linking 3D Human Poses and Natural Language
by: Delmas, Ginger, et al.
Published: (2022)

Learning Consistent Temporal Grounding between Related Tasks in Sports Coaching
by: Rai, Arushi, et al.
Published: (2026)

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
by: Zheng, Henry, et al.
Published: (2025)

HiSpatial: Taming Hierarchical 3D Spatial Understanding in Vision-Language Models
by: Liang, Huizhi, et al.
Published: (2026)

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
by: Jiao, Siyu, et al.
Published: (2024)

See4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting
by: Lu, Dongyue, et al.
Published: (2025)

Revisit Self-supervised Depth Estimation with Local Structure-from-Motion
by: Zhu, Shengjie, et al.
Published: (2024)

From Pixels to Prose: A Large Dataset of Dense Image Captions
by: Singla, Vasu, et al.
Published: (2024)