:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Bin, Li, Zongjian, Cheng, Xinhua, Niu, Yuwei, Ye, Yang, He, Xianyi, Yuan, Shenghai, Yu, Wangbo, Wang, Shaodong, Ge, Yunyang, Pang, Yatian, Yuan, Li
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2506.03147
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
by: Li, Zongjian, et al.
Published: (2025)

FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
by: Ge, Yunyang, et al.
Published: (2025)

ImgEdit: A Unified Image Editing Dataset and Benchmark
by: Ye, Yang, et al.
Published: (2025)

OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
by: Ge, Yunyang, et al.
Published: (2026)

Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
by: Li, Zongjian, et al.
Published: (2024)

SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
by: Zhao, Chengshu, et al.
Published: (2025)

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
by: Tang, Zhenyu, et al.
Published: (2024)

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
by: Niu, Yuwei, et al.
Published: (2025)

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
by: Chen, Liuhan, et al.
Published: (2024)

RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
by: Huang, Zhipeng, et al.
Published: (2024)

Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions
by: Zhou, Haiyang, et al.
Published: (2024)

iFSQ: Improving FSQ for Image Generation with 1 Line of Code
by: Lin, Bin, et al.
Published: (2026)

Unified Multimodal Models as Auto-Encoders
by: Yan, Zhiyuan, et al.
Published: (2025)

E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras
by: Feng, Chaoran, et al.
Published: (2025)

Identity-Preserving Text-to-Video Generation by Frequency Decomposition
by: Yuan, Shenghai, et al.
Published: (2024)

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
by: Yuan, Shenghai, et al.
Published: (2025)

UniStitch: Unifying Semantic and Geometric Features for Image Stitching
by: Mei, Yuan, et al.
Published: (2026)

EF-VI: Enhancing End-Frame Injection for Video Inbetweening
by: Chen, Liuhan, et al.
Published: (2025)

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
by: Jin, Peng, et al.
Published: (2023)

HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
by: Zhou, Haiyang, et al.
Published: (2025)

UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)

UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
by: Xu, Yueming, et al.
Published: (2025)

UniV2D: Bridging Visual Restoration and Semantic Perception for Underwater Salient Object Detection
by: Chang, Laibin, et al.
Published: (2026)

Uni-RS: A Spatially Faithful Unified Understanding and Generation Model for Remote Sensing
by: Zhang, Weiyu, et al.
Published: (2026)

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene
by: Feng, Chaoran, et al.
Published: (2025)

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
by: Niu, Yuwei, et al.
Published: (2025)

Jacquard V2: Refining Datasets using the Human In the Loop Data Correction Method
by: Li, Qiuhao, et al.
Published: (2024)

Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
by: Yang, Shuo, et al.
Published: (2025)

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
by: Yuan, Shenghai, et al.
Published: (2024)

SeViCES: Unifying Semantic-Visual Evidence Consensus for Long Video Understanding
by: Sheng, Yuan, et al.
Published: (2025)

Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
by: Wang, Peiyu, et al.
Published: (2025)

UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)

Towards Open-World Referring Expression Comprehension: A Benchmark with Training-free Multi-task Consistency Checker
by: Wu, Zongjian, et al.
Published: (2026)

UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
by: Song, Xinyang, et al.
Published: (2025)

Envision3D: One Image to 3D with Anchor Views Interpolation
by: Pang, Yatian, et al.
Published: (2024)

NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations
by: Tang, Zhenyu, et al.
Published: (2025)

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
by: Wang, Yuanrui, et al.
Published: (2025)

Next Patch Prediction for Autoregressive Visual Generation
by: Pang, Yatian, et al.
Published: (2024)