Saved in:
| Main Authors: | Lin, Bin, Li, Zongjian, Cheng, Xinhua, Niu, Yuwei, Ye, Yang, He, Xianyi, Yuan, Shenghai, Yu, Wangbo, Wang, Shaodong, Ge, Yunyang, Pang, Yatian, Yuan, Li |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.03147 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
by: Li, Zongjian, et al.
Published: (2025)
by: Li, Zongjian, et al.
Published: (2025)
FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
by: Ge, Yunyang, et al.
Published: (2025)
by: Ge, Yunyang, et al.
Published: (2025)
ImgEdit: A Unified Image Editing Dataset and Benchmark
by: Ye, Yang, et al.
Published: (2025)
by: Ye, Yang, et al.
Published: (2025)
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
by: Ge, Yunyang, et al.
Published: (2026)
by: Ge, Yunyang, et al.
Published: (2026)
Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)
by: Lin, Bin, et al.
Published: (2024)
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
by: Li, Zongjian, et al.
Published: (2024)
by: Li, Zongjian, et al.
Published: (2024)
SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
by: Zhao, Chengshu, et al.
Published: (2025)
by: Zhao, Chengshu, et al.
Published: (2025)
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
by: Tang, Zhenyu, et al.
Published: (2024)
by: Tang, Zhenyu, et al.
Published: (2024)
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
by: Niu, Yuwei, et al.
Published: (2025)
by: Niu, Yuwei, et al.
Published: (2025)
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
by: Chen, Liuhan, et al.
Published: (2024)
by: Chen, Liuhan, et al.
Published: (2024)
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
by: Huang, Zhipeng, et al.
Published: (2024)
by: Huang, Zhipeng, et al.
Published: (2024)
Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)
by: Yuan, Shenghai, et al.
Published: (2026)
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions
by: Zhou, Haiyang, et al.
Published: (2024)
by: Zhou, Haiyang, et al.
Published: (2024)
iFSQ: Improving FSQ for Image Generation with 1 Line of Code
by: Lin, Bin, et al.
Published: (2026)
by: Lin, Bin, et al.
Published: (2026)
Unified Multimodal Models as Auto-Encoders
by: Yan, Zhiyuan, et al.
Published: (2025)
by: Yan, Zhiyuan, et al.
Published: (2025)
E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras
by: Feng, Chaoran, et al.
Published: (2025)
by: Feng, Chaoran, et al.
Published: (2025)
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
by: Yuan, Shenghai, et al.
Published: (2024)
by: Yuan, Shenghai, et al.
Published: (2024)
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
by: Yuan, Shenghai, et al.
Published: (2025)
by: Yuan, Shenghai, et al.
Published: (2025)
UniStitch: Unifying Semantic and Geometric Features for Image Stitching
by: Mei, Yuan, et al.
Published: (2026)
by: Mei, Yuan, et al.
Published: (2026)
EF-VI: Enhancing End-Frame Injection for Video Inbetweening
by: Chen, Liuhan, et al.
Published: (2025)
by: Chen, Liuhan, et al.
Published: (2025)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
by: Jin, Peng, et al.
Published: (2023)
by: Jin, Peng, et al.
Published: (2023)
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
by: Zhou, Haiyang, et al.
Published: (2025)
by: Zhou, Haiyang, et al.
Published: (2025)
UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)
by: Ma, Chuofan, et al.
Published: (2025)
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
by: Xu, Yueming, et al.
Published: (2025)
by: Xu, Yueming, et al.
Published: (2025)
UniV2D: Bridging Visual Restoration and Semantic Perception for Underwater Salient Object Detection
by: Chang, Laibin, et al.
Published: (2026)
by: Chang, Laibin, et al.
Published: (2026)
Uni-RS: A Spatially Faithful Unified Understanding and Generation Model for Remote Sensing
by: Zhang, Weiyu, et al.
Published: (2026)
by: Zhang, Weiyu, et al.
Published: (2026)
AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene
by: Feng, Chaoran, et al.
Published: (2025)
by: Feng, Chaoran, et al.
Published: (2025)
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
by: Niu, Yuwei, et al.
Published: (2025)
by: Niu, Yuwei, et al.
Published: (2025)
Jacquard V2: Refining Datasets using the Human In the Loop Data Correction Method
by: Li, Qiuhao, et al.
Published: (2024)
by: Li, Qiuhao, et al.
Published: (2024)
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
by: Yuan, Shenghai, et al.
Published: (2024)
by: Yuan, Shenghai, et al.
Published: (2024)
SeViCES: Unifying Semantic-Visual Evidence Consensus for Long Video Understanding
by: Sheng, Yuan, et al.
Published: (2025)
by: Sheng, Yuan, et al.
Published: (2025)
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
by: Wang, Peiyu, et al.
Published: (2025)
by: Wang, Peiyu, et al.
Published: (2025)
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
Towards Open-World Referring Expression Comprehension: A Benchmark with Training-free Multi-task Consistency Checker
by: Wu, Zongjian, et al.
Published: (2026)
by: Wu, Zongjian, et al.
Published: (2026)
UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
by: Song, Xinyang, et al.
Published: (2025)
by: Song, Xinyang, et al.
Published: (2025)
Envision3D: One Image to 3D with Anchor Views Interpolation
by: Pang, Yatian, et al.
Published: (2024)
by: Pang, Yatian, et al.
Published: (2024)
NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations
by: Tang, Zhenyu, et al.
Published: (2025)
by: Tang, Zhenyu, et al.
Published: (2025)
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
by: Wang, Yuanrui, et al.
Published: (2025)
by: Wang, Yuanrui, et al.
Published: (2025)
Next Patch Prediction for Autoregressive Visual Generation
by: Pang, Yatian, et al.
Published: (2024)
by: Pang, Yatian, et al.
Published: (2024)
Similar Items
-
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
by: Li, Zongjian, et al.
Published: (2025) -
FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
by: Ge, Yunyang, et al.
Published: (2025) -
ImgEdit: A Unified Image Editing Dataset and Benchmark
by: Ye, Yang, et al.
Published: (2025) -
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
by: Ge, Yunyang, et al.
Published: (2026) -
Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)