Saved in:
| Main Authors: | Wei, Zhengxuan, Guo, Xu, Li, Xinghui, Xiang, Xunzhi, Wei, Min, Zhu, Yiran, Wang, Qiulin, Wang, Xintao, Wan, Pengfei, Hou, Xiangwang, Fan, Qi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.02436 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)
by: Wei, Cong, et al.
Published: (2025)
Make It Efficient: Dynamic Sparse Attention for Autoregressive Image Generation
by: Xiang, Xunzhi, et al.
Published: (2025)
by: Xiang, Xunzhi, et al.
Published: (2025)
Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval
by: Yu, Jiwen, et al.
Published: (2025)
by: Yu, Jiwen, et al.
Published: (2025)
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention
by: Ju, Xuan, et al.
Published: (2025)
by: Ju, Xuan, et al.
Published: (2025)
GameFactory: Creating New Games with Generative Interactive Videos
by: Yu, Jiwen, et al.
Published: (2025)
by: Yu, Jiwen, et al.
Published: (2025)
UniCustom: Unified Visual Conditioning for Multi-Reference Image Generation
by: Xu, Yiyan, et al.
Published: (2026)
by: Xu, Yiyan, et al.
Published: (2026)
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
by: Guo, Xu, et al.
Published: (2026)
by: Guo, Xu, et al.
Published: (2026)
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
by: Huang, Yuzhou, et al.
Published: (2025)
by: Huang, Yuzhou, et al.
Published: (2025)
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
by: Cai, Minghong, et al.
Published: (2025)
by: Cai, Minghong, et al.
Published: (2025)
UNIC: Unified In-Context Video Editing
by: Ye, Zixuan, et al.
Published: (2025)
by: Ye, Zixuan, et al.
Published: (2025)
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
by: Zheng, Xunzhi, et al.
Published: (2025)
by: Zheng, Xunzhi, et al.
Published: (2025)
FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers
by: He, Xuanhua, et al.
Published: (2025)
by: He, Xuanhua, et al.
Published: (2025)
Position: Interactive Generative Video as Next-Generation Game Engine
by: Yu, Jiwen, et al.
Published: (2025)
by: Yu, Jiwen, et al.
Published: (2025)
Geometry-Aware Rotary Position Embedding for Consistent Video World Model
by: Xiang, Chendong, et al.
Published: (2026)
by: Xiang, Chendong, et al.
Published: (2026)
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
by: Huang, Kaiyi, et al.
Published: (2026)
by: Huang, Kaiyi, et al.
Published: (2026)
DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
by: Yang, Zhenhao, et al.
Published: (2026)
by: Yang, Zhenhao, et al.
Published: (2026)
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
by: Dai, Yixiang, et al.
Published: (2025)
by: Dai, Yixiang, et al.
Published: (2025)
Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation
by: Ma, Junyuan, et al.
Published: (2026)
by: Ma, Junyuan, et al.
Published: (2026)
Pathwise Test-Time Correction for Autoregressive Long Video Generation
by: Xiang, Xunzhi, et al.
Published: (2026)
by: Xiang, Xunzhi, et al.
Published: (2026)
A Survey of Interactive Generative Video
by: Yu, Jiwen, et al.
Published: (2025)
by: Yu, Jiwen, et al.
Published: (2025)
Simulating the Visual World with Artificial Intelligence: A Roadmap
by: Yue, Jingtong, et al.
Published: (2025)
by: Yue, Jingtong, et al.
Published: (2025)
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
by: Wang, Qinghe, et al.
Published: (2025)
by: Wang, Qinghe, et al.
Published: (2025)
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
by: Chen, Kaijin, et al.
Published: (2026)
by: Chen, Kaijin, et al.
Published: (2026)
Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models
by: Ye, Zixuan, et al.
Published: (2025)
by: Ye, Zixuan, et al.
Published: (2025)
StyleMaster: Stylize Your Video with Artistic Generation and Translation
by: Ye, Zixuan, et al.
Published: (2024)
by: Ye, Zixuan, et al.
Published: (2024)
Adaptive AUV Hunting Policy with Covert Communication via Diffusion Model
by: Guo, Xu, et al.
Published: (2025)
by: Guo, Xu, et al.
Published: (2025)
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation
by: Xing, Ke, et al.
Published: (2025)
by: Xing, Ke, et al.
Published: (2025)
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026)
by: Wang, Yuan, et al.
Published: (2026)
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
by: Fang, Zhixue, et al.
Published: (2026)
by: Fang, Zhixue, et al.
Published: (2026)
SketchVideo: Sketch-based Video Generation and Editing
by: Liu, Feng-Lin, et al.
Published: (2025)
by: Liu, Feng-Lin, et al.
Published: (2025)
SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation
by: Zhang, Guiyu, et al.
Published: (2026)
by: Zhang, Guiyu, et al.
Published: (2026)
A Semi-supervised Physics-Aware Triple-Stream Underwater Image Enhancement Network
by: Xu, Shixuan, et al.
Published: (2023)
by: Xu, Shixuan, et al.
Published: (2023)
SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation
by: Shang, Yihan, et al.
Published: (2026)
by: Shang, Yihan, et al.
Published: (2026)
Reduced-order modeling of Hamiltonian dynamics based on symplectic neural networks
by: Chen, Yongsheng, et al.
Published: (2025)
by: Chen, Yongsheng, et al.
Published: (2025)
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
by: Du, Shian, et al.
Published: (2025)
by: Du, Shian, et al.
Published: (2025)
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
by: Fu, Xiao, et al.
Published: (2025)
by: Fu, Xiao, et al.
Published: (2025)
SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Scaling Image and Video Generation via Test-Time Evolutionary Search
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion
by: Yin, Yuyang, et al.
Published: (2025)
by: Yin, Yuyang, et al.
Published: (2025)
Improving Video Generation with Human Feedback
by: Liu, Jie, et al.
Published: (2025)
by: Liu, Jie, et al.
Published: (2025)
Similar Items
-
UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025) -
Make It Efficient: Dynamic Sparse Attention for Autoregressive Image Generation
by: Xiang, Xunzhi, et al.
Published: (2025) -
Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval
by: Yu, Jiwen, et al.
Published: (2025) -
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention
by: Ju, Xuan, et al.
Published: (2025) -
GameFactory: Creating New Games with Generative Interactive Videos
by: Yu, Jiwen, et al.
Published: (2025)