:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Jinjing, Wei, Fangyun, Liu, Zhening, Zhang, Hongyang, Xu, Chang, Lu, Yan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.15716
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Animate Any Character in Any World
by: Wang, Yitong, et al.
Published: (2025)

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
by: Shen, Yichao, et al.
Published: (2025)

From Virtual Games to Real-World Play
by: Sun, Wenqiang, et al.
Published: (2025)

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
by: Wang, Zun, et al.
Published: (2026)

Pack and Force Your Memory: Long-form and Consistent Video Generation
by: Wu, Xiaofei, et al.
Published: (2025)

Yan: Foundational Interactive Video Generation
by: Ye, Deheng, et al.
Published: (2025)

SEDEG:Sequential Enhancement of Decoder and Encoder's Generality for Class Incremental Learning with Small Memory
by: Chen, Hongyang, et al.
Published: (2025)

SpatialMem: Metric-Aligned Long-Horizon Video Memory for Language Grounding and QA
by: Zheng, Xinyi, et al.
Published: (2026)

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
by: Chen, Jierun, et al.
Published: (2024)

Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization
by: Lin, Fangyu, et al.
Published: (2026)

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
by: Zhang, Yiming, et al.
Published: (2023)

Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data
by: Zhu, Jinjing, et al.
Published: (2024)

Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation
by: Xiao, Zeqi, et al.
Published: (2025)

Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning
by: Qin, Jialong, et al.
Published: (2025)

Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly 4D Reconstruction
by: Liu, Zhening, et al.
Published: (2024)

AEMIM: Adversarial Examples Meet Masked Image Modeling
by: Xiang, Wenzhao, et al.
Published: (2024)

ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
by: Zhang, Mengchen, et al.
Published: (2025)

SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
by: Xiong, Haomiao, et al.
Published: (2025)

Draft-and-Target Sampling for Video Generation Policy
by: Zhang, Qikang, et al.
Published: (2026)

Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection
by: Zhao, Hongyang, et al.
Published: (2025)

Comp-Attn: Present-and-Align Attention for Compositional Video Generation
by: Zhang, Hongyu, et al.
Published: (2025)

Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling
by: Yan, Jiebin, et al.
Published: (2025)

RemedyGS: Defend 3D Gaussian Splatting against Computation Cost Attacks
by: Li, Yanping, et al.
Published: (2025)

SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models
by: Zhao, Ruosen, et al.
Published: (2025)

DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
by: Zhang, Runze, et al.
Published: (2025)

LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans
by: Huang, Zhening, et al.
Published: (2025)

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding
by: Wang, Yun, et al.
Published: (2025)

Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)

MambaOVSR: Multiscale Fusion with Global Motion Modeling for Chinese Opera Video Super-Resolution
by: Chang, Hua, et al.
Published: (2025)

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
by: Zhan, Zheng, et al.
Published: (2024)

Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
by: Liu, Zhening, et al.
Published: (2024)

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
by: Shi, Fengyuan, et al.
Published: (2023)

MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
by: Zhao, Haoyu, et al.
Published: (2023)

Learning Plug-and-play Memory for Guiding Video Diffusion Models
by: Song, Selena, et al.
Published: (2025)

Enabling Versatile Controls for Video Diffusion Models
by: Zhang, Xu, et al.
Published: (2025)

TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models
by: Li, Pengxiang, et al.
Published: (2023)

ShoulderShot: Generating Over-the-Shoulder Dialogue Videos
by: Zhang, Yuang, et al.
Published: (2025)

Boximator: Generating Rich and Controllable Motions for Video Synthesis
by: Wang, Jiawei, et al.
Published: (2024)

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
by: Zhao, Yu, et al.
Published: (2024)