:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lin, Bin, Ge, Yunyang, Cheng, Xinhua, Li, Zongjian, Zhu, Bin, Wang, Shaodong, He, Xianyi, Ye, Yang, Yuan, Shenghai, Chen, Liuhan, Jia, Tanghui, Zhang, Junwu, Tang, Zhenyu, Pang, Yatian, She, Bin, Yan, Cen, Hu, Zhiheng, Dong, Xiaoyi, Chen, Lin, Pan, Zhang, Zhou, Xing, Dong, Shaoling, Tian, Yonghong, Yuan, Li
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2412.00131
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
von: Lin, Bin, et al.
Veröffentlicht: (2025)

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
von: Li, Zongjian, et al.
Veröffentlicht: (2024)

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
von: Chen, Liuhan, et al.
Veröffentlicht: (2024)

FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
von: Ge, Yunyang, et al.
Veröffentlicht: (2025)

OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
von: Ge, Yunyang, et al.
Veröffentlicht: (2026)

ImgEdit: A Unified Image Editing Dataset and Benchmark
von: Ye, Yang, et al.
Veröffentlicht: (2025)

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
von: Tang, Zhenyu, et al.
Veröffentlicht: (2024)

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
von: Yuan, Shenghai, et al.
Veröffentlicht: (2025)

Envision3D: One Image to 3D with Anchor Views Interpolation
von: Pang, Yatian, et al.
Veröffentlicht: (2024)

SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
von: Zhao, Chengshu, et al.
Veröffentlicht: (2025)

Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
von: Li, Zongjian, et al.
Veröffentlicht: (2025)

EF-VI: Enhancing End-Frame Injection for Video Inbetweening
von: Chen, Liuhan, et al.
Veröffentlicht: (2025)

Identity-Preserving Text-to-Video Generation by Frequency Decomposition
von: Yuan, Shenghai, et al.
Veröffentlicht: (2024)

Next Patch Prediction for Autoregressive Visual Generation
von: Pang, Yatian, et al.
Veröffentlicht: (2024)

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
von: Lin, Bin, et al.
Veröffentlicht: (2024)

AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene
von: Feng, Chaoran, et al.
Veröffentlicht: (2025)

TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation
von: Zongying, Lin, et al.
Veröffentlicht: (2024)

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
von: Yuan, Shenghai, et al.
Veröffentlicht: (2024)

Towards Open-World Referring Expression Comprehension: A Benchmark with Training-free Multi-task Consistency Checker
von: Wu, Zongjian, et al.
Veröffentlicht: (2026)

LLMBind: A Unified Modality-Task Integration Framework
von: Zhu, Bin, et al.
Veröffentlicht: (2024)

Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model
von: Cui, Jiaxi, et al.
Veröffentlicht: (2023)

iFSQ: Improving FSQ for Image Generation with 1 Line of Code
von: Lin, Bin, et al.
Veröffentlicht: (2026)

Helios: Real Real-Time Long Video Generation Model
von: Yuan, Shenghai, et al.
Veröffentlicht: (2026)

Sora Generates Videos with Stunning Geometrical Consistency
von: Li, Xuanyi, et al.
Veröffentlicht: (2024)

Verifcation of general multi-qudit pure states
von: Zhang, Xiao-Dong, et al.
Veröffentlicht: (2025)

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
von: Zhu, Bin, et al.
Veröffentlicht: (2023)

Sora OpenAI's Prelude: Social Media Perspectives on Sora OpenAI and the Future of AI Video Generation
von: Mogavi, Reza Hadi, et al.
Veröffentlicht: (2024)

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
von: Niu, Yuwei, et al.
Veröffentlicht: (2025)

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
von: Pang, Yatian, et al.
Veröffentlicht: (2024)

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
von: Wang, Junjie, et al.
Veröffentlicht: (2025)

QED Effects on Kerr-Newman Black Hole Shadows
von: Yuan, Shaobing, et al.
Veröffentlicht: (2024)

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
von: Chen, Lin, et al.
Veröffentlicht: (2024)

Open-Sora: Democratizing Efficient Video Production for All
von: Zheng, Zangwei, et al.
Veröffentlicht: (2024)

A Density-Delay Law for Stable Event-Driven State Progression in Open Distributed Systems
von: Chen, Bin, et al.
Veröffentlicht: (2026)

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
von: Wang, Junjie, et al.
Veröffentlicht: (2024)

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets
von: He, Conghui, et al.
Veröffentlicht: (2024)

NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations
von: Tang, Zhenyu, et al.
Veröffentlicht: (2025)

E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras
von: Feng, Chaoran, et al.
Veröffentlicht: (2025)

RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
von: Huang, Zhipeng, et al.
Veröffentlicht: (2024)

Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation
von: Li, Lin, et al.
Veröffentlicht: (2025)