:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Pang, Yatian, Jin, Peng, Yang, Shuo, Lin, Bin, Zhu, Bin, Tang, Zhenyu, Chen, Liuhan, Tay, Francis E. H., Lim, Ser-Nam, Yang, Harry, Yuan, Li
Formato:	Preprint
Publicado:	2024
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2412.15321
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
por: Pang, Yatian, et al.
Publicado: (2024)

VideoMerge: Towards Training-free Long Video Generation
por: Zhang, Siyang, et al.
Publicado: (2025)

Beyond Generation: Unlocking Universal Editing via Self-Supervised Fine-Tuning
por: Chen, Harold Haodong, et al.
Publicado: (2024)

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
por: Tang, Zhenyu, et al.
Publicado: (2024)

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
por: Lin, Bin, et al.
Publicado: (2024)

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
por: Zheng, Mingzhe, et al.
Publicado: (2025)

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
por: Zheng, Mingzhe, et al.
Publicado: (2024)

Envision3D: One Image to 3D with Anchor Views Interpolation
por: Pang, Yatian, et al.
Publicado: (2024)

Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
por: Chen, Harold Haodong, et al.
Publicado: (2025)

Towards Chunk-Wise Generation for Long Videos
por: Zhang, Siyang, et al.
Publicado: (2024)

Object Recognition as Next Token Prediction
por: Yue, Kaiyu, et al.
Publicado: (2023)

Open-Sora Plan: Open-Source Large Video Generation Model
por: Lin, Bin, et al.
Publicado: (2024)

SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
por: Zhao, Chengshu, et al.
Publicado: (2025)

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
por: Tian, Keyu, et al.
Publicado: (2024)

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
por: Li, Zongjian, et al.
Publicado: (2024)

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
por: Lin, Bin, et al.
Publicado: (2025)

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
por: Ren, Sucheng, et al.
Publicado: (2025)

Autoregressive Video Generation beyond Next Frames Prediction
por: Ren, Sucheng, et al.
Publicado: (2025)

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
por: Chen, Liuhan, et al.
Publicado: (2024)

AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
por: Fang, Pengjun, et al.
Publicado: (2026)

Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
por: Wu, Xianzu, et al.
Publicado: (2025)

Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising
por: Wang, Yifan, et al.
Publicado: (2025)

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
por: Lin, Bin, et al.
Publicado: (2023)

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
por: Jang, Young Kyun, et al.
Publicado: (2024)

AirSketch: Generative Motion to Sketch
por: Lim, Hui Xian Grace, et al.
Publicado: (2024)

AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation
por: Liu, Yexin, et al.
Publicado: (2025)

Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
por: Qian, Zhaofang, et al.
Publicado: (2024)

Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
por: Yang, Shuo, et al.
Publicado: (2025)

Video Decomposition Prior: A Methodology to Decompose Videos into Layers
por: Shrivastava, Gaurav, et al.
Publicado: (2024)

Temporal Regularization Makes Your Video Generator Stronger
por: Chen, Harold Haodong, et al.
Publicado: (2025)

BOOKAGENT: Orchestrating Safety-Aware Visual Narratives via Multi-Agent Cognitive Calibration
por: Gao, Bo, et al.
Publicado: (2026)

Is This Predictor More Informative than Another? A Decision-Theoretical Comparison
por: Feng, Yiding, et al.
Publicado: (2025)

Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
por: Li, Jinghan, et al.
Publicado: (2025)

E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras
por: Feng, Chaoran, et al.
Publicado: (2025)

DiReCT: Disentangled Regularization of Contrastive Trajectories for Physics-Refined Video Generation
por: Meyarian, Abolfazl, et al.
Publicado: (2026)

What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?
por: Cui, Xuanming, et al.
Publicado: (2025)

Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward
por: Xu, Xiaogang, et al.
Publicado: (2025)

FSViewFusion: Few-Shots View Generation of Novel Objects
por: Hussain, Rukhshanda, et al.
Publicado: (2024)

Trajeglish: Traffic Modeling as Next-Token Prediction
por: Philion, Jonah, et al.
Publicado: (2023)

FVAR: Visual Autoregressive Modeling via Next Focus Prediction
por: Li, Xiaofan, et al.
Publicado: (2025)