:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wei, Zhengxuan, Guo, Xu, Li, Xinghui, Xiang, Xunzhi, Wei, Min, Zhu, Yiran, Wang, Qiulin, Wang, Xintao, Wan, Pengfei, Hou, Xiangwang, Fan, Qi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2606.02436
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)

Make It Efficient: Dynamic Sparse Attention for Autoregressive Image Generation
by: Xiang, Xunzhi, et al.
Published: (2025)

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval
by: Yu, Jiwen, et al.
Published: (2025)

FullDiT: Multi-Task Video Generative Foundation Model with Full Attention
by: Ju, Xuan, et al.
Published: (2025)

GameFactory: Creating New Games with Generative Interactive Videos
by: Yu, Jiwen, et al.
Published: (2025)

UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation
by: Xu, Yiyan, et al.
Published: (2026)

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
by: Guo, Xu, et al.
Published: (2026)

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
by: Huang, Yuzhou, et al.
Published: (2025)

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
by: Cai, Minghong, et al.
Published: (2025)

UNIC: Unified In-Context Video Editing
by: Ye, Zixuan, et al.
Published: (2025)

Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
by: Zheng, Xunzhi, et al.
Published: (2025)

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers
by: He, Xuanhua, et al.
Published: (2025)

Position: Interactive Generative Video as Next-Generation Game Engine
by: Yu, Jiwen, et al.
Published: (2025)

Geometry-Aware Rotary Position Embedding for Consistent Video World Model
by: Xiang, Chendong, et al.
Published: (2026)

CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
by: Huang, Kaiyi, et al.
Published: (2026)

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
by: Yang, Zhenhao, et al.
Published: (2026)

FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
by: Dai, Yixiang, et al.
Published: (2025)

Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation
by: Ma, Junyuan, et al.
Published: (2026)

Pathwise Test-Time Correction for Autoregressive Long Video Generation
by: Xiang, Xunzhi, et al.
Published: (2026)

A Survey of Interactive Generative Video
by: Yu, Jiwen, et al.
Published: (2025)

Simulating the Visual World with Artificial Intelligence: A Roadmap
by: Yue, Jingtong, et al.
Published: (2025)

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
by: Wang, Qinghe, et al.
Published: (2025)

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
by: Chen, Kaijin, et al.
Published: (2026)

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models
by: Ye, Zixuan, et al.
Published: (2025)

StyleMaster: Stylize Your Video with Artistic Generation and Translation
by: Ye, Zixuan, et al.
Published: (2024)

Adaptive AUV Hunting Policy with Covert Communication via Diffusion Model
by: Guo, Xu, et al.
Published: (2025)

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation
by: Xing, Ke, et al.
Published: (2025)

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026)

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
by: Fang, Zhixue, et al.
Published: (2026)

SketchVideo: Sketch-based Video Generation and Editing
by: Liu, Feng-Lin, et al.
Published: (2025)

SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation
by: Zhang, Guiyu, et al.
Published: (2026)

A Semi-supervised Physics-Aware Triple-Stream Underwater Image Enhancement Network
by: Xu, Shixuan, et al.
Published: (2023)

SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation
by: Shang, Yihan, et al.
Published: (2026)

Reduced-order modeling of Hamiltonian dynamics based on symplectic neural networks
by: Chen, Yongsheng, et al.
Published: (2025)

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
by: Du, Shian, et al.
Published: (2025)

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
by: Fu, Xiao, et al.
Published: (2025)

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
by: Wu, Tao, et al.
Published: (2024)

Scaling Image and Video Generation via Test-Time Evolutionary Search
by: He, Haoran, et al.
Published: (2025)

PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion
by: Yin, Yuyang, et al.
Published: (2025)

Improving Video Generation with Human Feedback
by: Liu, Jie, et al.
Published: (2025)