Guardado en:
| Autores principales: | Pang, Yatian, Jin, Peng, Yang, Shuo, Lin, Bin, Zhu, Bin, Tang, Zhenyu, Chen, Liuhan, Tay, Francis E. H., Lim, Ser-Nam, Yang, Harry, Yuan, Li |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2412.15321 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
por: Pang, Yatian, et al.
Publicado: (2024)
por: Pang, Yatian, et al.
Publicado: (2024)
VideoMerge: Towards Training-free Long Video Generation
por: Zhang, Siyang, et al.
Publicado: (2025)
por: Zhang, Siyang, et al.
Publicado: (2025)
Beyond Generation: Unlocking Universal Editing via Self-Supervised Fine-Tuning
por: Chen, Harold Haodong, et al.
Publicado: (2024)
por: Chen, Harold Haodong, et al.
Publicado: (2024)
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
por: Tang, Zhenyu, et al.
Publicado: (2024)
por: Tang, Zhenyu, et al.
Publicado: (2024)
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
por: Lin, Bin, et al.
Publicado: (2024)
por: Lin, Bin, et al.
Publicado: (2024)
VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
por: Zheng, Mingzhe, et al.
Publicado: (2025)
por: Zheng, Mingzhe, et al.
Publicado: (2025)
VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
por: Zheng, Mingzhe, et al.
Publicado: (2024)
por: Zheng, Mingzhe, et al.
Publicado: (2024)
Envision3D: One Image to 3D with Anchor Views Interpolation
por: Pang, Yatian, et al.
Publicado: (2024)
por: Pang, Yatian, et al.
Publicado: (2024)
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
por: Chen, Harold Haodong, et al.
Publicado: (2025)
por: Chen, Harold Haodong, et al.
Publicado: (2025)
Towards Chunk-Wise Generation for Long Videos
por: Zhang, Siyang, et al.
Publicado: (2024)
por: Zhang, Siyang, et al.
Publicado: (2024)
Object Recognition as Next Token Prediction
por: Yue, Kaiyu, et al.
Publicado: (2023)
por: Yue, Kaiyu, et al.
Publicado: (2023)
Open-Sora Plan: Open-Source Large Video Generation Model
por: Lin, Bin, et al.
Publicado: (2024)
por: Lin, Bin, et al.
Publicado: (2024)
SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
por: Zhao, Chengshu, et al.
Publicado: (2025)
por: Zhao, Chengshu, et al.
Publicado: (2025)
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
por: Tian, Keyu, et al.
Publicado: (2024)
por: Tian, Keyu, et al.
Publicado: (2024)
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
por: Li, Zongjian, et al.
Publicado: (2024)
por: Li, Zongjian, et al.
Publicado: (2024)
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
por: Lin, Bin, et al.
Publicado: (2025)
por: Lin, Bin, et al.
Publicado: (2025)
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
por: Ren, Sucheng, et al.
Publicado: (2025)
por: Ren, Sucheng, et al.
Publicado: (2025)
Autoregressive Video Generation beyond Next Frames Prediction
por: Ren, Sucheng, et al.
Publicado: (2025)
por: Ren, Sucheng, et al.
Publicado: (2025)
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
por: Chen, Liuhan, et al.
Publicado: (2024)
por: Chen, Liuhan, et al.
Publicado: (2024)
AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
por: Fang, Pengjun, et al.
Publicado: (2026)
por: Fang, Pengjun, et al.
Publicado: (2026)
Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View
por: Wu, Xianzu, et al.
Publicado: (2025)
por: Wu, Xianzu, et al.
Publicado: (2025)
Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising
por: Wang, Yifan, et al.
Publicado: (2025)
por: Wang, Yifan, et al.
Publicado: (2025)
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
por: Lin, Bin, et al.
Publicado: (2023)
por: Lin, Bin, et al.
Publicado: (2023)
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
por: Jang, Young Kyun, et al.
Publicado: (2024)
por: Jang, Young Kyun, et al.
Publicado: (2024)
AirSketch: Generative Motion to Sketch
por: Lim, Hui Xian Grace, et al.
Publicado: (2024)
por: Lim, Hui Xian Grace, et al.
Publicado: (2024)
AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation
por: Liu, Yexin, et al.
Publicado: (2025)
por: Liu, Yexin, et al.
Publicado: (2025)
Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
por: Qian, Zhaofang, et al.
Publicado: (2024)
por: Qian, Zhaofang, et al.
Publicado: (2024)
Look-Back: Implicit Visual Re-focusing in MLLM Reasoning
por: Yang, Shuo, et al.
Publicado: (2025)
por: Yang, Shuo, et al.
Publicado: (2025)
Video Decomposition Prior: A Methodology to Decompose Videos into Layers
por: Shrivastava, Gaurav, et al.
Publicado: (2024)
por: Shrivastava, Gaurav, et al.
Publicado: (2024)
Temporal Regularization Makes Your Video Generator Stronger
por: Chen, Harold Haodong, et al.
Publicado: (2025)
por: Chen, Harold Haodong, et al.
Publicado: (2025)
BOOKAGENT: Orchestrating Safety-Aware Visual Narratives via Multi-Agent Cognitive Calibration
por: Gao, Bo, et al.
Publicado: (2026)
por: Gao, Bo, et al.
Publicado: (2026)
Is This Predictor More Informative than Another? A Decision-Theoretical Comparison
por: Feng, Yiding, et al.
Publicado: (2025)
por: Feng, Yiding, et al.
Publicado: (2025)
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
por: Li, Jinghan, et al.
Publicado: (2025)
por: Li, Jinghan, et al.
Publicado: (2025)
E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras
por: Feng, Chaoran, et al.
Publicado: (2025)
por: Feng, Chaoran, et al.
Publicado: (2025)
DiReCT: Disentangled Regularization of Contrastive Trajectories for Physics-Refined Video Generation
por: Meyarian, Abolfazl, et al.
Publicado: (2026)
por: Meyarian, Abolfazl, et al.
Publicado: (2026)
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?
por: Cui, Xuanming, et al.
Publicado: (2025)
por: Cui, Xuanming, et al.
Publicado: (2025)
Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward
por: Xu, Xiaogang, et al.
Publicado: (2025)
por: Xu, Xiaogang, et al.
Publicado: (2025)
FSViewFusion: Few-Shots View Generation of Novel Objects
por: Hussain, Rukhshanda, et al.
Publicado: (2024)
por: Hussain, Rukhshanda, et al.
Publicado: (2024)
Trajeglish: Traffic Modeling as Next-Token Prediction
por: Philion, Jonah, et al.
Publicado: (2023)
por: Philion, Jonah, et al.
Publicado: (2023)
FVAR: Visual Autoregressive Modeling via Next Focus Prediction
por: Li, Xiaofan, et al.
Publicado: (2025)
por: Li, Xiaofan, et al.
Publicado: (2025)
Ejemplares similares
-
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
por: Pang, Yatian, et al.
Publicado: (2024) -
VideoMerge: Towards Training-free Long Video Generation
por: Zhang, Siyang, et al.
Publicado: (2025) -
Beyond Generation: Unlocking Universal Editing via Self-Supervised Fine-Tuning
por: Chen, Harold Haodong, et al.
Publicado: (2024) -
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
por: Tang, Zhenyu, et al.
Publicado: (2024) -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
por: Lin, Bin, et al.
Publicado: (2024)