:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fan, Siyuan, Du, Bo, Cai, Xiantao, Peng, Bo, Sun, Longling
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.03302
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

3D Human Interaction Generation: A Survey
by: Fan, Siyuan, et al.
Published: (2025)

Controllable Text-to-Motion Generation via Modular Body-Part Phase Control
by: Dai, Minyue, et al.
Published: (2026)

ParCo: Part-Coordinating Text-to-Motion Synthesis
by: Zou, Qiran, et al.
Published: (2024)

SFA: Scan, Focus, and Amplify toward Guidance-aware Answering for Video TextVQA
by: He, Haibin, et al.
Published: (2025)

ParTY: Part-Guidance for Expressive Text-to-Motion Synthesis
by: Heo, KunHo, et al.
Published: (2026)

Autonomous Character-Scene Interaction Synthesis from Text Instruction
by: Jiang, Nan, et al.
Published: (2024)

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
by: Zhao, Chengfeng, et al.
Published: (2023)

Generating Human Interaction Motions in Scenes with Text Control
by: Yi, Hongwei, et al.
Published: (2024)

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
by: Fan, Ke, et al.
Published: (2024)

The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility
by: Zhang, Xiantao
Published: (2025)

Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?
by: He, Haibin, et al.
Published: (2025)

T3M: Text Guided 3D Human Motion Synthesis from Speech
by: Peng, Wenshuo, et al.
Published: (2024)

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting
by: Tang, Jiaxiang, et al.
Published: (2024)

AnyText2: Visual Text Generation and Editing With Customizable Attributes
by: Tuo, Yuxiang, et al.
Published: (2024)

MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
by: Zhou, Kangneng, et al.
Published: (2023)

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
by: Ye, Xingsong, et al.
Published: (2024)

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
by: Cha, Junuk, et al.
Published: (2024)

Text Data-Centric Image Captioning with Interactive Prompts
by: Wang, Yiyu, et al.
Published: (2024)

Rethink Sparse Signals for Pose-guided Text-to-image Generation
by: Xuan, Wenjie, et al.
Published: (2025)

Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization
by: Vora, Aditya, et al.
Published: (2025)

Hear the Scene: Audio-Enhanced Text Spotting
by: Li, Jing, et al.
Published: (2024)

HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models
by: Peng, Xiaogang, et al.
Published: (2023)

Text to Blind Motion
by: Kim, Hee Jae, et al.
Published: (2024)

VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA
by: He, Haibin, et al.
Published: (2026)

ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis
by: Zhang, Xike, et al.
Published: (2026)

Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing
by: Lu, Ruiying, et al.
Published: (2022)

Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
by: Miao, Honglei, et al.
Published: (2024)

GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation
by: Gao, Xuehao, et al.
Published: (2024)

Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking
by: Zhang, Zhengbo, et al.
Published: (2026)

SegVol: Universal and Interactive Volumetric Medical Image Segmentation
by: Du, Yuxin, et al.
Published: (2023)

Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis
by: Gong, Jingyu, et al.
Published: (2024)

Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
by: Shan, Mengyi, et al.
Published: (2024)

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
by: Peng, Bo, et al.
Published: (2023)

PALUM: Part-based Attention Learning for Unified Motion Retargeting
by: Liu, Siqi, et al.
Published: (2026)

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
by: Jin, Peng, et al.
Published: (2024)

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
by: Wang, Zixiao, et al.
Published: (2024)

Topology-Agnostic Animal Motion Generation from Text Prompt
by: Chen, Keyi, et al.
Published: (2025)

Text2Place: Affordance-aware Text Guided Human Placement
by: Parihar, Rishubh, et al.
Published: (2024)

Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization
by: Sun, Haoyuan, et al.
Published: (2024)

Text-Video Multi-Grained Integration for Video Moment Montage
by: Yin, Zhihui, et al.
Published: (2024)