Saved in:
| Main Authors: | Zhao, Jinjing, Wei, Fangyun, Liu, Zhening, Zhang, Hongyang, Xu, Chang, Lu, Yan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.15716 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Animate Any Character in Any World
by: Wang, Yitong, et al.
Published: (2025)
by: Wang, Yitong, et al.
Published: (2025)
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
by: Shen, Yichao, et al.
Published: (2025)
by: Shen, Yichao, et al.
Published: (2025)
From Virtual Games to Real-World Play
by: Sun, Wenqiang, et al.
Published: (2025)
by: Sun, Wenqiang, et al.
Published: (2025)
AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
by: Wang, Zun, et al.
Published: (2026)
by: Wang, Zun, et al.
Published: (2026)
Pack and Force Your Memory: Long-form and Consistent Video Generation
by: Wu, Xiaofei, et al.
Published: (2025)
by: Wu, Xiaofei, et al.
Published: (2025)
Yan: Foundational Interactive Video Generation
by: Ye, Deheng, et al.
Published: (2025)
by: Ye, Deheng, et al.
Published: (2025)
SEDEG:Sequential Enhancement of Decoder and Encoder's Generality for Class Incremental Learning with Small Memory
by: Chen, Hongyang, et al.
Published: (2025)
by: Chen, Hongyang, et al.
Published: (2025)
SpatialMem: Metric-Aligned Long-Horizon Video Memory for Language Grounding and QA
by: Zheng, Xinyi, et al.
Published: (2026)
by: Zheng, Xinyi, et al.
Published: (2026)
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
by: Chen, Jierun, et al.
Published: (2024)
by: Chen, Jierun, et al.
Published: (2024)
Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization
by: Lin, Fangyu, et al.
Published: (2026)
by: Lin, Fangyu, et al.
Published: (2026)
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
by: Zhang, Yiming, et al.
Published: (2023)
by: Zhang, Yiming, et al.
Published: (2023)
Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data
by: Zhu, Jinjing, et al.
Published: (2024)
by: Zhu, Jinjing, et al.
Published: (2024)
Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation
by: Xiao, Zeqi, et al.
Published: (2025)
by: Xiao, Zeqi, et al.
Published: (2025)
Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning
by: Qin, Jialong, et al.
Published: (2025)
by: Qin, Jialong, et al.
Published: (2025)
Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly 4D Reconstruction
by: Liu, Zhening, et al.
Published: (2024)
by: Liu, Zhening, et al.
Published: (2024)
AEMIM: Adversarial Examples Meet Masked Image Modeling
by: Xiang, Wenzhao, et al.
Published: (2024)
by: Xiang, Wenzhao, et al.
Published: (2024)
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)
by: Zhang, Mengchen, et al.
Published: (2025)
SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)
by: Tan, Shanwen, et al.
Published: (2026)
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
by: Xiong, Haomiao, et al.
Published: (2025)
by: Xiong, Haomiao, et al.
Published: (2025)
Draft-and-Target Sampling for Video Generation Policy
by: Zhang, Qikang, et al.
Published: (2026)
by: Zhang, Qikang, et al.
Published: (2026)
Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection
by: Zhao, Hongyang, et al.
Published: (2025)
by: Zhao, Hongyang, et al.
Published: (2025)
Comp-Attn: Present-and-Align Attention for Compositional Video Generation
by: Zhang, Hongyu, et al.
Published: (2025)
by: Zhang, Hongyu, et al.
Published: (2025)
Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling
by: Yan, Jiebin, et al.
Published: (2025)
by: Yan, Jiebin, et al.
Published: (2025)
RemedyGS: Defend 3D Gaussian Splatting against Computation Cost Attacks
by: Li, Yanping, et al.
Published: (2025)
by: Li, Yanping, et al.
Published: (2025)
SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models
by: Zhao, Ruosen, et al.
Published: (2025)
by: Zhao, Ruosen, et al.
Published: (2025)
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
by: Zhang, Runze, et al.
Published: (2025)
by: Zhang, Runze, et al.
Published: (2025)
LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans
by: Huang, Zhening, et al.
Published: (2025)
by: Huang, Zhening, et al.
Published: (2025)
Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding
by: Wang, Yun, et al.
Published: (2025)
by: Wang, Yun, et al.
Published: (2025)
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)
by: Cheng, Dingxin, et al.
Published: (2024)
MambaOVSR: Multiscale Fusion with Global Motion Modeling for Chinese Opera Video Super-Resolution
by: Chang, Hua, et al.
Published: (2025)
by: Chang, Hua, et al.
Published: (2025)
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
by: Zhan, Zheng, et al.
Published: (2024)
by: Zhan, Zheng, et al.
Published: (2024)
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
by: Liu, Zhening, et al.
Published: (2024)
by: Liu, Zhening, et al.
Published: (2024)
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
by: Shi, Fengyuan, et al.
Published: (2023)
by: Shi, Fengyuan, et al.
Published: (2023)
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
by: Zhao, Haoyu, et al.
Published: (2023)
by: Zhao, Haoyu, et al.
Published: (2023)
Learning Plug-and-play Memory for Guiding Video Diffusion Models
by: Song, Selena, et al.
Published: (2025)
by: Song, Selena, et al.
Published: (2025)
Enabling Versatile Controls for Video Diffusion Models
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
by: Li, Pengxiang, et al.
Published: (2023)
by: Li, Pengxiang, et al.
Published: (2023)
ShoulderShot: Generating Over-the-Shoulder Dialogue Videos
by: Zhang, Yuang, et al.
Published: (2025)
by: Zhang, Yuang, et al.
Published: (2025)
Boximator: Generating Rich and Controllable Motions for Video Synthesis
by: Wang, Jiawei, et al.
Published: (2024)
by: Wang, Jiawei, et al.
Published: (2024)
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
by: Zhao, Yu, et al.
Published: (2024)
by: Zhao, Yu, et al.
Published: (2024)
Similar Items
-
Animate Any Character in Any World
by: Wang, Yitong, et al.
Published: (2025) -
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
by: Shen, Yichao, et al.
Published: (2025) -
From Virtual Games to Real-World Play
by: Sun, Wenqiang, et al.
Published: (2025) -
AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
by: Wang, Zun, et al.
Published: (2026) -
Pack and Force Your Memory: Long-form and Consistent Video Generation
by: Wu, Xiaofei, et al.
Published: (2025)