:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Chen, Ming, Cui, Liyuan, Zhang, Wenyuan, Zhang, Haoxian, Zhou, Yan, Li, Xiaohan, Tang, Songlin, Liu, Jiwen, Liao, Borui, Chen, Hejia, Liu, Xiaoqiang, Wan, Pengfei
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2508.19320
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

PEAR: Pixel-aligned Expressive humAn mesh Recovery
di: Wu, Jiahao, et al.
Pubblicazione: (2026)

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
di: Chen, Hejia, et al.
Pubblicazione: (2025)

From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping
di: He, Xu, et al.
Pubblicazione: (2025)

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers
di: Peng, Ziqiao, et al.
Pubblicazione: (2025)

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis
di: Ding, Yikang, et al.
Pubblicazione: (2025)

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
di: Fang, Zhixue, et al.
Pubblicazione: (2026)

Semantic-Aware Prefix Learning for Token-Efficient Image Generation
di: Li, Qingfeng, et al.
Pubblicazione: (2026)

AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising
di: Cui, Liyuan, et al.
Pubblicazione: (2026)

GameFactory: Creating New Games with Generative Interactive Videos
di: Yu, Jiwen, et al.
Pubblicazione: (2025)

A Survey of Interactive Generative Video
di: Yu, Jiwen, et al.
Pubblicazione: (2025)

Position: Interactive Generative Video as Next-Generation Game Engine
di: Yu, Jiwen, et al.
Pubblicazione: (2025)

Demand for catastrophe insurance under the path-dependent effects
di: Cui, Liyuan, et al.
Pubblicazione: (2025)

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval
di: Yu, Jiwen, et al.
Pubblicazione: (2025)

Astra: General Interactive World Model with Autoregressive Denoising
di: Zhu, Yixuan, et al.
Pubblicazione: (2025)

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
di: Hu, Wentao, et al.
Pubblicazione: (2025)

VC-Agent: An Interactive Agent for Customized Video Dataset Collection
di: Zhang, Yidan, et al.
Pubblicazione: (2025)

IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation
di: Xu, Zhufeng, et al.
Pubblicazione: (2026)

Kling-MotionControl Technical Report
di: Kling Team, et al.
Pubblicazione: (2026)

The Application of Digital Life Stories in Elderly Care: Methodological Limitations and Future Directions
di: Zilin Zhao, et al.
Pubblicazione: (2025)

ARIG: Autoregressive Interactive Head Generation for Real-time Conversations
di: Guo, Ying, et al.
Pubblicazione: (2025)

Path Choice Matters for Clear Attribution in Path Methods
di: Zhang, Borui, et al.
Pubblicazione: (2024)

Preventing Local Pitfalls in Vector Quantization via Optimal Transport
di: Zhang, Borui, et al.
Pubblicazione: (2024)

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
di: Chen, Kaijin, et al.
Pubblicazione: (2026)

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
di: Zhang, Yiyuan, et al.
Pubblicazione: (2024)

SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation
di: Li, Zisu, et al.
Pubblicazione: (2025)

KlingAvatar 2.0 Technical Report
di: Kling Team, et al.
Pubblicazione: (2025)

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
di: Zhang, Shiyi, et al.
Pubblicazione: (2024)

An approach to hummed-tune and song sequences matching
di: Pham, Loc Bao, et al.
Pubblicazione: (2024)

FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
di: Liao, Borui, et al.
Pubblicazione: (2025)

SFTok: Bridging the Performance Gap in Discrete Tokenizers
di: Rao, Qihang, et al.
Pubblicazione: (2025)

Quantize-then-Rectify: Efficient VQ-VAE Training
di: Zhang, Borui, et al.
Pubblicazione: (2025)

Fast Shapley Value Estimation: A Unified Approach
di: Zhang, Borui, et al.
Pubblicazione: (2023)

Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression
di: Wang, Lirui, et al.
Pubblicazione: (2025)

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
di: Chern, Ethan, et al.
Pubblicazione: (2025)

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
di: Wang, Yuan, et al.
Pubblicazione: (2026)

MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
di: Zhang, Zhicheng, et al.
Pubblicazione: (2025)

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
di: Liu, Yilian, et al.
Pubblicazione: (2026)

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
di: Guo, Jianzhu, et al.
Pubblicazione: (2024)

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation
di: Xiao, Steven, et al.
Pubblicazione: (2025)

Threshold MIDAS Forecasting of Canadian Inflation Rate
di: Chaoyi Chen, et al.
Pubblicazione: (2025)