Saved in:
| Main Authors: | Fan, Siyuan, Du, Bo, Cai, Xiantao, Peng, Bo, Sun, Longling |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.03302 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
3D Human Interaction Generation: A Survey
by: Fan, Siyuan, et al.
Published: (2025)
by: Fan, Siyuan, et al.
Published: (2025)
Controllable Text-to-Motion Generation via Modular Body-Part Phase Control
by: Dai, Minyue, et al.
Published: (2026)
by: Dai, Minyue, et al.
Published: (2026)
ParCo: Part-Coordinating Text-to-Motion Synthesis
by: Zou, Qiran, et al.
Published: (2024)
by: Zou, Qiran, et al.
Published: (2024)
SFA: Scan, Focus, and Amplify toward Guidance-aware Answering for Video TextVQA
by: He, Haibin, et al.
Published: (2025)
by: He, Haibin, et al.
Published: (2025)
ParTY: Part-Guidance for Expressive Text-to-Motion Synthesis
by: Heo, KunHo, et al.
Published: (2026)
by: Heo, KunHo, et al.
Published: (2026)
Autonomous Character-Scene Interaction Synthesis from Text Instruction
by: Jiang, Nan, et al.
Published: (2024)
by: Jiang, Nan, et al.
Published: (2024)
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
by: Zhao, Chengfeng, et al.
Published: (2023)
by: Zhao, Chengfeng, et al.
Published: (2023)
Generating Human Interaction Motions in Scenes with Text Control
by: Yi, Hongwei, et al.
Published: (2024)
by: Yi, Hongwei, et al.
Published: (2024)
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
by: Fan, Ke, et al.
Published: (2024)
by: Fan, Ke, et al.
Published: (2024)
The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility
by: Zhang, Xiantao
Published: (2025)
by: Zhang, Xiantao
Published: (2025)
Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?
by: He, Haibin, et al.
Published: (2025)
by: He, Haibin, et al.
Published: (2025)
T3M: Text Guided 3D Human Motion Synthesis from Speech
by: Peng, Wenshuo, et al.
Published: (2024)
by: Peng, Wenshuo, et al.
Published: (2024)
InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting
by: Tang, Jiaxiang, et al.
Published: (2024)
by: Tang, Jiaxiang, et al.
Published: (2024)
AnyText2: Visual Text Generation and Editing With Customizable Attributes
by: Tuo, Yuxiang, et al.
Published: (2024)
by: Tuo, Yuxiang, et al.
Published: (2024)
MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
by: Zhou, Kangneng, et al.
Published: (2023)
by: Zhou, Kangneng, et al.
Published: (2023)
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
by: Ye, Xingsong, et al.
Published: (2024)
by: Ye, Xingsong, et al.
Published: (2024)
Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
by: Cha, Junuk, et al.
Published: (2024)
by: Cha, Junuk, et al.
Published: (2024)
Text Data-Centric Image Captioning with Interactive Prompts
by: Wang, Yiyu, et al.
Published: (2024)
by: Wang, Yiyu, et al.
Published: (2024)
Rethink Sparse Signals for Pose-guided Text-to-image Generation
by: Xuan, Wenjie, et al.
Published: (2025)
by: Xuan, Wenjie, et al.
Published: (2025)
Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization
by: Vora, Aditya, et al.
Published: (2025)
by: Vora, Aditya, et al.
Published: (2025)
Hear the Scene: Audio-Enhanced Text Spotting
by: Li, Jing, et al.
Published: (2024)
by: Li, Jing, et al.
Published: (2024)
HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models
by: Peng, Xiaogang, et al.
Published: (2023)
by: Peng, Xiaogang, et al.
Published: (2023)
Text to Blind Motion
by: Kim, Hee Jae, et al.
Published: (2024)
by: Kim, Hee Jae, et al.
Published: (2024)
VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA
by: He, Haibin, et al.
Published: (2026)
by: He, Haibin, et al.
Published: (2026)
ET-SAM: Efficient Point Prompt Prediction in SAM for Unified Scene Text Detection and Layout Analysis
by: Zhang, Xike, et al.
Published: (2026)
by: Zhang, Xike, et al.
Published: (2026)
Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing
by: Lu, Ruiying, et al.
Published: (2022)
by: Lu, Ruiying, et al.
Published: (2022)
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
by: Miao, Honglei, et al.
Published: (2024)
by: Miao, Honglei, et al.
Published: (2024)
GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation
by: Gao, Xuehao, et al.
Published: (2024)
by: Gao, Xuehao, et al.
Published: (2024)
Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking
by: Zhang, Zhengbo, et al.
Published: (2026)
by: Zhang, Zhengbo, et al.
Published: (2026)
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
by: Du, Yuxin, et al.
Published: (2023)
by: Du, Yuxin, et al.
Published: (2023)
Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis
by: Gong, Jingyu, et al.
Published: (2024)
by: Gong, Jingyu, et al.
Published: (2024)
Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
by: Shan, Mengyi, et al.
Published: (2024)
by: Shan, Mengyi, et al.
Published: (2024)
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
by: Peng, Bo, et al.
Published: (2023)
by: Peng, Bo, et al.
Published: (2023)
PALUM: Part-based Attention Learning for Unified Motion Retargeting
by: Liu, Siqi, et al.
Published: (2026)
by: Liu, Siqi, et al.
Published: (2026)
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
by: Wang, Zixiao, et al.
Published: (2024)
by: Wang, Zixiao, et al.
Published: (2024)
Topology-Agnostic Animal Motion Generation from Text Prompt
by: Chen, Keyi, et al.
Published: (2025)
by: Chen, Keyi, et al.
Published: (2025)
Text2Place: Affordance-aware Text Guided Human Placement
by: Parihar, Rishubh, et al.
Published: (2024)
by: Parihar, Rishubh, et al.
Published: (2024)
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization
by: Sun, Haoyuan, et al.
Published: (2024)
by: Sun, Haoyuan, et al.
Published: (2024)
Text-Video Multi-Grained Integration for Video Moment Montage
by: Yin, Zhihui, et al.
Published: (2024)
by: Yin, Zhihui, et al.
Published: (2024)
Similar Items
-
3D Human Interaction Generation: A Survey
by: Fan, Siyuan, et al.
Published: (2025) -
Controllable Text-to-Motion Generation via Modular Body-Part Phase Control
by: Dai, Minyue, et al.
Published: (2026) -
ParCo: Part-Coordinating Text-to-Motion Synthesis
by: Zou, Qiran, et al.
Published: (2024) -
SFA: Scan, Focus, and Amplify toward Guidance-aware Answering for Video TextVQA
by: He, Haibin, et al.
Published: (2025) -
ParTY: Part-Guidance for Expressive Text-to-Motion Synthesis
by: Heo, KunHo, et al.
Published: (2026)