Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Bin, Ou, Rong, Xu, Weisheng, Xiong, Jiaqi, Li, Xintao, Wang, Taowen, Zhu, Luyu, Jiang, Xu, Tan, Jing, Xu, Renjing
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.13751
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915798485827584
author	Yang, Bin Ou, Rong Xu, Weisheng Xiong, Jiaqi Li, Xintao Wang, Taowen Zhu, Luyu Jiang, Xu Tan, Jing Xu, Renjing
author_facet	Yang, Bin Ou, Rong Xu, Weisheng Xiong, Jiaqi Li, Xintao Wang, Taowen Zhu, Luyu Jiang, Xu Tan, Jing Xu, Renjing
contents	Most existing evaluations of text-to-motion generation focus on in-distribution textual inputs and a limited set of evaluation criteria, which restricts their ability to systematically assess model generalization and motion generation capabilities under complex out-of-distribution (OOD) textual conditions. To address this limitation, we propose a benchmark specifically designed for OOD text-to-motion evaluation, which includes a comprehensive analysis of 14 representative baseline models and the two datasets derived from evaluation results. Specifically, we construct an OOD prompt dataset consisting of 1,025 textual descriptions. Based on this prompt dataset, we introduce a unified evaluation framework that integrates LLM-based Evaluation, Multi-factor Motion evaluation, and Fine-grained Accuracy Evaluation. Our experimental results reveal that while different baseline models demonstrate strengths in areas such as text-to-motion semantic alignment, motion generalizability, and physical quality, most models struggle to achieve strong performance with Fine-grained Accuracy Evaluation. These findings highlight the limitations of existing methods in OOD scenarios and offer practical guidance for the design and evaluation of future production-level text-to-motion models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_13751
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	T2MBench: A Benchmark for Out-of-Distribution Text-to-Motion Generation Yang, Bin Ou, Rong Xu, Weisheng Xiong, Jiaqi Li, Xintao Wang, Taowen Zhu, Luyu Jiang, Xu Tan, Jing Xu, Renjing Computer Vision and Pattern Recognition Most existing evaluations of text-to-motion generation focus on in-distribution textual inputs and a limited set of evaluation criteria, which restricts their ability to systematically assess model generalization and motion generation capabilities under complex out-of-distribution (OOD) textual conditions. To address this limitation, we propose a benchmark specifically designed for OOD text-to-motion evaluation, which includes a comprehensive analysis of 14 representative baseline models and the two datasets derived from evaluation results. Specifically, we construct an OOD prompt dataset consisting of 1,025 textual descriptions. Based on this prompt dataset, we introduce a unified evaluation framework that integrates LLM-based Evaluation, Multi-factor Motion evaluation, and Fine-grained Accuracy Evaluation. Our experimental results reveal that while different baseline models demonstrate strengths in areas such as text-to-motion semantic alignment, motion generalizability, and physical quality, most models struggle to achieve strong performance with Fine-grained Accuracy Evaluation. These findings highlight the limitations of existing methods in OOD scenarios and offer practical guidance for the design and evaluation of future production-level text-to-motion models.
title	T2MBench: A Benchmark for Out-of-Distribution Text-to-Motion Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.13751

Similar Items