:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gan, Qijun, Ren, Yi, Zhang, Chen, Ye, Zhenhui, Xie, Pan, Yin, Xiang, Yuan, Zehuan, Peng, Bingyue, Zhu, Jianke
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.04847
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

InfinityHuman: Towards Long-Term Audio-Driven Human
by: Li, Xiaodi, et al.
Published: (2025)

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
by: Gan, Qijun, et al.
Published: (2024)

HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling
by: Chen, Junyi, et al.
Published: (2024)

ALIVE: Animate Your World with Lifelike Audio-Video Generation
by: Guo, Ying, et al.
Published: (2026)

Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering
by: Gan, Qijun, et al.
Published: (2024)

XHand: Real-time Expressive Hand Avatar
by: Gan, Qijun, et al.
Published: (2024)

HyperMotionX: The Dataset and Benchmark with DiT-Based Pose-Guided Human Image Animation of Complex Motions
by: Xu, Shuolin, et al.
Published: (2025)

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
by: Tian, Keyu, et al.
Published: (2024)

Generative Refinement Networks for Visual Synthesis
by: Han, Jian, et al.
Published: (2026)

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
by: Sun, Peize, et al.
Published: (2024)

OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation
by: Gan, Qijun, et al.
Published: (2025)

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
by: Wang, Xiang, et al.
Published: (2025)

Do As I Do: Pose Guided Human Motion Copy
by: Wu, Sifan, et al.
Published: (2024)

HLLM-Creator: Hierarchical LLM-based Personalized Creative Generation
by: Chen, Junyi, et al.
Published: (2025)

VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs
by: Qian, Dongjun, et al.
Published: (2025)

Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer
by: Shao, Ruizhi, et al.
Published: (2024)

EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
by: Yang, Yuxiao, et al.
Published: (2025)

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
by: Zhang, Shilong, et al.
Published: (2025)

Validation of Human Pose Estimation and Human Mesh Recovery for Extracting Clinically Relevant Motion Data from Videos
by: Armstrong, Kai, et al.
Published: (2025)

Language-Guided Transformer Tokenizer for Human Motion Generation
by: Yan, Sheng, et al.
Published: (2026)

Waver: Wave Your Way to Lifelike Video Generation
by: Zhang, Yifu, et al.
Published: (2025)

HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation
by: Han, Bing, et al.
Published: (2025)

Rethinking Generative Human Video Coding with Implicit Motion Transformation
by: Chen, Bolin, et al.
Published: (2025)

VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
by: Zheng, Jun, et al.
Published: (2024)

DiHuR: Diffusion-Guided Generalizable Human Reconstruction
by: Chen, Jinnan, et al.
Published: (2024)

Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
by: Qi, Tianhao, et al.
Published: (2025)

$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
by: Wang, Weiquan, et al.
Published: (2024)

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
by: Huang, Zehuan, et al.
Published: (2025)

UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
by: Han, Jian, et al.
Published: (2024)

HumanScore: Benchmarking Human Motions in Generated Videos
by: Fang, Yusu, et al.
Published: (2026)

Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton
by: Kang, Hongbo, et al.
Published: (2024)

Target Pose Guided Whole-body Grasping Motion Generation for Digital Humans
by: Shao, Quanquan, et al.
Published: (2024)

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
by: Zhang, Yuang, et al.
Published: (2024)

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers
by: He, Xuanhua, et al.
Published: (2025)

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers
by: Chen, Pengtao, et al.
Published: (2025)

PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation
by: He, Jingxuan, et al.
Published: (2025)

SMooDi: Stylized Motion Diffusion Model
by: Zhong, Lei, et al.
Published: (2024)

Kinematics Modeling Network for Video-based Human Pose Estimation
by: Dang, Yonghao, et al.
Published: (2022)

UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation
by: Jiang, Zhaodong, et al.
Published: (2025)