:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Mengge, Di, Yan, Wang, Gu, Qu, Yun, Zhu, Dekai, Li, Yanyan, Ji, Xiangyang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.20383
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation
by: Zhu, Dekai, et al.
Published: (2025)

MoSa: Motion Generation with Scalable Autoregressive Modeling
by: Liu, Mengyuan, et al.
Published: (2025)

HINT: Learning Complete Human Neural Representations from Limited Viewpoints
by: Sanvito, Alessandro, et al.
Published: (2024)

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
by: Jin, Peng, et al.
Published: (2024)

ExtraVAR: Stage-Aware RoPE Remapping for Resolution Extrapolation in Visual Autoregressive Models
by: Yan, Feihong, et al.
Published: (2026)

HINT: Composed Image Retrieval with Dual-path Compositional Contextualized Network
by: Zhang, Mingyu, et al.
Published: (2026)

OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression
by: Li, Zhe, et al.
Published: (2025)

Motion-Aware Caching for Efficient Autoregressive Video Generation
by: Xu, Jing, et al.
Published: (2026)

Causal Motion Diffusion Models for Autoregressive Motion Generation
by: Yu, Qing, et al.
Published: (2026)

Next-Scale Autoregressive Models for Text-to-Motion Generation
by: Zheng, Zhiwei, et al.
Published: (2026)

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
by: Ruiz-Ponce, Pablo, et al.
Published: (2025)

GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner
by: Qu, Haoxuan, et al.
Published: (2024)

FilmWeaver: Weaving Consistent Multi-Shot Videos with Cache-Guided Autoregressive Diffusion
by: Luo, Xiangyang, et al.
Published: (2025)

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis
by: He, Wenkun, et al.
Published: (2024)

ARIG: Autoregressive Interactive Head Generation for Real-time Conversations
by: Guo, Ying, et al.
Published: (2025)

ScaleMoGen: Autoregressive Next-Scale Prediction for Human Motion Generation
by: Hwang, Inwoo, et al.
Published: (2026)

ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation
by: Di, Yan, et al.
Published: (2023)

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
by: Meng, Yihao, et al.
Published: (2026)

MotionMERGE: A Multi-granular Framework for Human Motion Editing, Reasoning, Generation, and Explanation
by: Wu, Bizhu, et al.
Published: (2026)

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
by: Wang, Yabiao, et al.
Published: (2024)

Generative Data Augmentation for Object Point Cloud Segmentation
by: Zhu, Dekai, et al.
Published: (2025)

Spatial-Temporal State Propagation Autoregressive Model for 4D Object Generation
by: Yang, Liying, et al.
Published: (2026)

Large Motion Model for Unified Multi-Modal Motion Generation
by: Zhang, Mingyuan, et al.
Published: (2024)

U-VLM: Hierarchical Vision Language Modeling for Report Generation
by: Shi, Pengcheng, et al.
Published: (2026)

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
by: Zhao, Min, et al.
Published: (2026)

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
by: Mei, Jianbiao, et al.
Published: (2024)

HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention
by: Chen, Shuang, et al.
Published: (2024)

Dynamic Worlds, Dynamic Humans: Generating Virtual Human-Scene Interaction Motion in Dynamic Scenes
by: Wang, Yin, et al.
Published: (2026)

SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
by: Zhu, Dekai, et al.
Published: (2025)

Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning
by: Zhu, Yinglian, et al.
Published: (2025)

LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
by: Zhang, Ruida, et al.
Published: (2024)

Human Motion Video Generation: A Survey
by: Xue, Haiwei, et al.
Published: (2025)

BAMM: Bidirectional Autoregressive Motion Model
by: Pinyoanuntapong, Ekkasit, et al.
Published: (2024)

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
by: Liu, Xiaoyang, et al.
Published: (2024)

GenM$^3$: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
by: Shi, Junyu, et al.
Published: (2025)

VPG: Visual Prefix Guidance for Autoregressive Image and Video Generation
by: Liao, Xinyao, et al.
Published: (2026)

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression
by: Meng, Zichong, et al.
Published: (2024)

Pressure2Motion: Hierarchical Human Motion Reconstruction from Ground Pressure with Text Guidance
by: Li, Zhengxuan, et al.
Published: (2025)

MotionCharacter: Fine-Grained Motion Controllable Human Video Generation
by: Fang, Haopeng, et al.
Published: (2024)

HUMOF: Human Motion Forecasting in Interactive Social Scenes
by: Sun, Caiyi, et al.
Published: (2025)