Saved in:
| Main Authors: | Gan, Qijun, Ren, Yi, Zhang, Chen, Ye, Zhenhui, Xie, Pan, Yin, Xiang, Yuan, Zehuan, Peng, Bingyue, Zhu, Jianke |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.04847 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InfinityHuman: Towards Long-Term Audio-Driven Human
by: Li, Xiaodi, et al.
Published: (2025)
by: Li, Xiaodi, et al.
Published: (2025)
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
by: Gan, Qijun, et al.
Published: (2024)
by: Gan, Qijun, et al.
Published: (2024)
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling
by: Chen, Junyi, et al.
Published: (2024)
by: Chen, Junyi, et al.
Published: (2024)
ALIVE: Animate Your World with Lifelike Audio-Video Generation
by: Guo, Ying, et al.
Published: (2026)
by: Guo, Ying, et al.
Published: (2026)
Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering
by: Gan, Qijun, et al.
Published: (2024)
by: Gan, Qijun, et al.
Published: (2024)
XHand: Real-time Expressive Hand Avatar
by: Gan, Qijun, et al.
Published: (2024)
by: Gan, Qijun, et al.
Published: (2024)
HyperMotionX: The Dataset and Benchmark with DiT-Based Pose-Guided Human Image Animation of Complex Motions
by: Xu, Shuolin, et al.
Published: (2025)
by: Xu, Shuolin, et al.
Published: (2025)
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
by: Tian, Keyu, et al.
Published: (2024)
by: Tian, Keyu, et al.
Published: (2024)
Generative Refinement Networks for Visual Synthesis
by: Han, Jian, et al.
Published: (2026)
by: Han, Jian, et al.
Published: (2026)
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
by: Sun, Peize, et al.
Published: (2024)
by: Sun, Peize, et al.
Published: (2024)
OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation
by: Gan, Qijun, et al.
Published: (2025)
by: Gan, Qijun, et al.
Published: (2025)
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
by: Wang, Xiang, et al.
Published: (2025)
by: Wang, Xiang, et al.
Published: (2025)
Do As I Do: Pose Guided Human Motion Copy
by: Wu, Sifan, et al.
Published: (2024)
by: Wu, Sifan, et al.
Published: (2024)
HLLM-Creator: Hierarchical LLM-based Personalized Creative Generation
by: Chen, Junyi, et al.
Published: (2025)
by: Chen, Junyi, et al.
Published: (2025)
VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs
by: Qian, Dongjun, et al.
Published: (2025)
by: Qian, Dongjun, et al.
Published: (2025)
Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer
by: Shao, Ruizhi, et al.
Published: (2024)
by: Shao, Ruizhi, et al.
Published: (2024)
EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
by: Yang, Yuxiao, et al.
Published: (2025)
by: Yang, Yuxiao, et al.
Published: (2025)
FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
by: Zhang, Shilong, et al.
Published: (2025)
by: Zhang, Shilong, et al.
Published: (2025)
Validation of Human Pose Estimation and Human Mesh Recovery for Extracting Clinically Relevant Motion Data from Videos
by: Armstrong, Kai, et al.
Published: (2025)
by: Armstrong, Kai, et al.
Published: (2025)
Language-Guided Transformer Tokenizer for Human Motion Generation
by: Yan, Sheng, et al.
Published: (2026)
by: Yan, Sheng, et al.
Published: (2026)
Waver: Wave Your Way to Lifelike Video Generation
by: Zhang, Yifu, et al.
Published: (2025)
by: Zhang, Yifu, et al.
Published: (2025)
HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation
by: Han, Bing, et al.
Published: (2025)
by: Han, Bing, et al.
Published: (2025)
Rethinking Generative Human Video Coding with Implicit Motion Transformation
by: Chen, Bolin, et al.
Published: (2025)
by: Chen, Bolin, et al.
Published: (2025)
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
by: Zheng, Jun, et al.
Published: (2024)
by: Zheng, Jun, et al.
Published: (2024)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction
by: Chen, Jinnan, et al.
Published: (2024)
by: Chen, Jinnan, et al.
Published: (2024)
Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
by: Qi, Tianhao, et al.
Published: (2025)
by: Qi, Tianhao, et al.
Published: (2025)
$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
by: Wang, Weiquan, et al.
Published: (2024)
by: Wang, Weiquan, et al.
Published: (2024)
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
by: Huang, Zehuan, et al.
Published: (2025)
by: Huang, Zehuan, et al.
Published: (2025)
UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)
by: Ma, Chuofan, et al.
Published: (2025)
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
by: Han, Jian, et al.
Published: (2024)
by: Han, Jian, et al.
Published: (2024)
HumanScore: Benchmarking Human Motions in Generated Videos
by: Fang, Yusu, et al.
Published: (2026)
by: Fang, Yusu, et al.
Published: (2026)
Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton
by: Kang, Hongbo, et al.
Published: (2024)
by: Kang, Hongbo, et al.
Published: (2024)
Target Pose Guided Whole-body Grasping Motion Generation for Digital Humans
by: Shao, Quanquan, et al.
Published: (2024)
by: Shao, Quanquan, et al.
Published: (2024)
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
by: Zhang, Yuang, et al.
Published: (2024)
by: Zhang, Yuang, et al.
Published: (2024)
FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers
by: He, Xuanhua, et al.
Published: (2025)
by: He, Xuanhua, et al.
Published: (2025)
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers
by: Chen, Pengtao, et al.
Published: (2025)
by: Chen, Pengtao, et al.
Published: (2025)
PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation
by: He, Jingxuan, et al.
Published: (2025)
by: He, Jingxuan, et al.
Published: (2025)
SMooDi: Stylized Motion Diffusion Model
by: Zhong, Lei, et al.
Published: (2024)
by: Zhong, Lei, et al.
Published: (2024)
Kinematics Modeling Network for Video-based Human Pose Estimation
by: Dang, Yonghao, et al.
Published: (2022)
by: Dang, Yonghao, et al.
Published: (2022)
UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation
by: Jiang, Zhaodong, et al.
Published: (2025)
by: Jiang, Zhaodong, et al.
Published: (2025)
Similar Items
-
InfinityHuman: Towards Long-Term Audio-Driven Human
by: Li, Xiaodi, et al.
Published: (2025) -
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
by: Gan, Qijun, et al.
Published: (2024) -
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling
by: Chen, Junyi, et al.
Published: (2024) -
ALIVE: Animate Your World with Lifelike Audio-Video Generation
by: Guo, Ying, et al.
Published: (2026) -
Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering
by: Gan, Qijun, et al.
Published: (2024)