Saved in:
| Main Authors: | Yang, Fengyuan, Huang, Luying, Guan, Jiazhi, Yang, Quanwei, Pan, Dongwei, Fu, Jianglin, Feng, Haocheng, He, Wei, Wang, Kaisiyuan, Zhou, Hang, Yao, Angela |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.01043 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DISPLAY: Directable Human-Object Interaction Video Generation via Sparse Motion Guidance and Multi-Task Auxiliary
by: Guan, Jiazhi, et al.
Published: (2026)
by: Guan, Jiazhi, et al.
Published: (2026)
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
by: Yang, Quanwei, et al.
Published: (2025)
by: Yang, Quanwei, et al.
Published: (2025)
InterDyad: Interactive Dyadic Speech-to-Video Generation by Querying Intermediate Visual Guidance
by: Pan, Dongwei, et al.
Published: (2026)
by: Pan, Dongwei, et al.
Published: (2026)
Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025)
by: Sun, Yasheng, et al.
Published: (2025)
TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
by: Guan, Jiazhi, et al.
Published: (2024)
by: Guan, Jiazhi, et al.
Published: (2024)
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
by: Fan, Yingying, et al.
Published: (2025)
by: Fan, Yingying, et al.
Published: (2025)
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
by: Guan, Jiazhi, et al.
Published: (2025)
by: Guan, Jiazhi, et al.
Published: (2025)
GenHOI: Towards Object-Consistent Hand-Object Interaction with Temporally Balanced and Spatially Selective Object Injection
by: Huang, Xuan, et al.
Published: (2026)
by: Huang, Xuan, et al.
Published: (2026)
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
by: Guan, Jiazhi, et al.
Published: (2024)
by: Guan, Jiazhi, et al.
Published: (2024)
DESIGN OF A NEW ONE-SHOT TOOL WITH DOUBLE MARGINS STRUCTURE IN DRILLING TITANIUM/COMPOSITE STACKS
by: Chong Zhang, et al.
Published: (2017)
by: Chong Zhang, et al.
Published: (2017)
iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer
by: Shen, Zhelun, et al.
Published: (2025)
by: Shen, Zhelun, et al.
Published: (2025)
Aggregating Nearest Sharp Features via Hybrid Transformers for Video Deblurring
by: Shang, Wei, et al.
Published: (2023)
by: Shang, Wei, et al.
Published: (2023)
Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
by: Ma, Yue, et al.
Published: (2025)
by: Ma, Yue, et al.
Published: (2025)
Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context
by: Shen, Cuifeng, et al.
Published: (2025)
by: Shen, Cuifeng, et al.
Published: (2025)
THGS: Lifelike Talking Human Avatar Synthesis From Monocular Video Via 3D Gaussian Splatting
by: Chuang Chen, et al.
Published: (2025)
by: Chuang Chen, et al.
Published: (2025)
THE SHOT HEARD ROUND THE WORLD
by: THOMAS, EVAN
Published: (2006)
by: THOMAS, EVAN
Published: (2006)
KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
by: Yang, Fengyuan, et al.
Published: (2024)
by: Yang, Fengyuan, et al.
Published: (2024)
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
by: Shi, Fengyuan, et al.
Published: (2023)
by: Shi, Fengyuan, et al.
Published: (2023)
Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing
by: Wang, Changyue, et al.
Published: (2025)
by: Wang, Changyue, et al.
Published: (2025)
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
by: Wang, Boyuan, et al.
Published: (2025)
by: Wang, Boyuan, et al.
Published: (2025)
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
by: Sun, Yasheng, et al.
Published: (2024)
by: Sun, Yasheng, et al.
Published: (2024)
Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
by: Yang, Fengyuan, et al.
Published: (2024)
by: Yang, Fengyuan, et al.
Published: (2024)
Decoupled Diffusion Sparks Adaptive Scene Generation
by: Zhou, Yunsong, et al.
Published: (2025)
by: Zhou, Yunsong, et al.
Published: (2025)
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
by: Shi, Shuwei, et al.
Published: (2024)
by: Shi, Shuwei, et al.
Published: (2024)
EDITOR'S NOTE : ONE ON ONE
Published: (2007)
Published: (2007)
Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
by: Zuo, Yi, et al.
Published: (2024)
by: Zuo, Yi, et al.
Published: (2024)
Automatic Robot Task Planning by Integrating Large Language Model with Genetic Programming
by: Kobilov, Azizjon, et al.
Published: (2025)
by: Kobilov, Azizjon, et al.
Published: (2025)
HOI-Dyn: Learning Interaction Dynamics for Human-Object Motion Diffusion
by: Wu, Lin, et al.
Published: (2025)
by: Wu, Lin, et al.
Published: (2025)
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
by: Jin, Yang, et al.
Published: (2024)
by: Jin, Yang, et al.
Published: (2024)
DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
by: Zhong, Xiaojing, et al.
Published: (2024)
by: Zhong, Xiaojing, et al.
Published: (2024)
Virial Theorem and Its Applications in Instability of Two-Phase Water-Wave
by: Yang, Haocheng
Published: (2024)
by: Yang, Haocheng
Published: (2024)
Microlocal Partition of Energy for Fractional-Type Dispersive Equations
by: Yang, Haocheng
Published: (2023)
by: Yang, Haocheng
Published: (2023)
Cauchy Problem for Cylinder-like Capillary Jets
by: Yang, Haocheng
Published: (2024)
by: Yang, Haocheng
Published: (2024)
End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context
by: Guan, Yiran, et al.
Published: (2023)
by: Guan, Yiran, et al.
Published: (2023)
STACK TYPE DETECTION USING FEW-SHOT LEARNING
by: Lin, Henry
Published: (2022)
by: Lin, Henry
Published: (2022)
SHOT: A WEB SERVER FOR THE CONSTRUCTION OF GENOME PHYLOGENIES
by: KORBEL, J.O. ,ET AL
Published: (2002)
by: KORBEL, J.O. ,ET AL
Published: (2002)
VTok: A Unified Video Tokenizer with Decoupled Spatial-Temporal Latents
by: Wang, Feng, et al.
Published: (2026)
by: Wang, Feng, et al.
Published: (2026)
MVHOI: Bridge Multi-view Condition to Complex Human-Object Interaction Video Reenactment via 3D Foundation Model
by: Tong, Jinguang, et al.
Published: (2026)
by: Tong, Jinguang, et al.
Published: (2026)
Efficient Satellite-Ground Interconnection Design for Low-orbit Mega-Constellation Topology
by: Liu, Wenhao, et al.
Published: (2024)
by: Liu, Wenhao, et al.
Published: (2024)
Rational Communication Shapes Morphological Composition
by: Yang, Fengyuan, et al.
Published: (2026)
by: Yang, Fengyuan, et al.
Published: (2026)
Similar Items
-
DISPLAY: Directable Human-Object Interaction Video Generation via Sparse Motion Guidance and Multi-Task Auxiliary
by: Guan, Jiazhi, et al.
Published: (2026) -
GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation
by: Yang, Quanwei, et al.
Published: (2025) -
InterDyad: Interactive Dyadic Speech-to-Video Generation by Querying Intermediate Visual Guidance
by: Pan, Dongwei, et al.
Published: (2026) -
Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025) -
TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
by: Guan, Jiazhi, et al.
Published: (2024)