Saved in:
| Main Authors: | Zhao, Yizhou, Wang, Tuanfeng Y., Raj, Bhiksha, Xu, Min, Yang, Jimei, Huang, Chun-Hao Paul |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.14855 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Boosting Camera Motion Control for Video Diffusion Transformers
by: Cheong, Soon Yau, et al.
Published: (2024)
by: Cheong, Soon Yau, et al.
Published: (2024)
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
by: Huang, Zhening, et al.
Published: (2025)
by: Huang, Zhening, et al.
Published: (2025)
MASIV: Toward Material-Agnostic System Identification from Videos
by: Zhao, Yizhou, et al.
Published: (2025)
by: Zhao, Yizhou, et al.
Published: (2025)
ActAnywhere: Subject-Aware Video Background Generation
by: Pan, Boxiao, et al.
Published: (2024)
by: Pan, Boxiao, et al.
Published: (2024)
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
by: Wang, Junying, et al.
Published: (2025)
by: Wang, Junying, et al.
Published: (2025)
Pattern Guided UV Recovery for Realistic Video Garment Texturing
by: Zhan, Youyi, et al.
Published: (2024)
by: Zhan, Youyi, et al.
Published: (2024)
ControlVAR: Exploring Controllable Visual Autoregressive Modeling
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
by: Fang, Ye, et al.
Published: (2025)
by: Fang, Ye, et al.
Published: (2025)
Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting
by: Zhao, Yizhou, et al.
Published: (2025)
by: Zhao, Yizhou, et al.
Published: (2025)
OmniCam: Unified Multimodal Video Generation via Camera Control
by: Yang, Xiaoda, et al.
Published: (2025)
by: Yang, Xiaoda, et al.
Published: (2025)
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
by: Qiu, Kai, et al.
Published: (2025)
by: Qiu, Kai, et al.
Published: (2025)
D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
by: Huang, Yiyang, et al.
Published: (2025)
by: Huang, Yiyang, et al.
Published: (2025)
Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
by: Xu, Xiaohao, et al.
Published: (2025)
by: Xu, Xiaohao, et al.
Published: (2025)
TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos
by: Liu, Jinpeng, et al.
Published: (2026)
by: Liu, Jinpeng, et al.
Published: (2026)
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
by: Feng, Wanquan, et al.
Published: (2024)
by: Feng, Wanquan, et al.
Published: (2024)
JOG3R: Towards 3D-Consistent Video Generators
by: Huang, Chun-Hao Paul, et al.
Published: (2025)
by: Huang, Chun-Hao Paul, et al.
Published: (2025)
A Survey of 3D Reconstruction with Event Cameras
by: Xu, Chuanzhi, et al.
Published: (2025)
by: Xu, Chuanzhi, et al.
Published: (2025)
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
by: Waheed, Abdul, et al.
Published: (2025)
by: Waheed, Abdul, et al.
Published: (2025)
VerLM: Explaining Face Verification Using Natural Language
by: Hannan, Syed Abdul, et al.
Published: (2026)
by: Hannan, Syed Abdul, et al.
Published: (2026)
An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning
by: Chen, Hao, et al.
Published: (2022)
by: Chen, Hao, et al.
Published: (2022)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
by: Chen, Hao, et al.
Published: (2023)
by: Chen, Hao, et al.
Published: (2023)
SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
by: Wu, Tianshu, et al.
Published: (2026)
by: Wu, Tianshu, et al.
Published: (2026)
Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures
by: Xu, Yuancheng, et al.
Published: (2025)
by: Xu, Yuancheng, et al.
Published: (2025)
Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM
by: Weng, Zhenzhen, et al.
Published: (2024)
by: Weng, Zhenzhen, et al.
Published: (2024)
Echo4DIR: 4D Implicit Heart Reconstruction from 2D Echocardiography Videos
by: Liu, Yanan, et al.
Published: (2026)
by: Liu, Yanan, et al.
Published: (2026)
PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling
by: Dirik, Alara, et al.
Published: (2025)
by: Dirik, Alara, et al.
Published: (2025)
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
by: Jeong, Hyeonho, et al.
Published: (2024)
by: Jeong, Hyeonho, et al.
Published: (2024)
Customizable Perturbation Synthesis for Robust SLAM Benchmarking
by: Xu, Xiaohao, et al.
Published: (2024)
by: Xu, Xiaohao, et al.
Published: (2024)
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction
by: Cao, Wei, et al.
Published: (2026)
by: Cao, Wei, et al.
Published: (2026)
GloTSFormer: Global Video Text Spotting Transformer
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
CHRIS: Clothed Human Reconstruction with Side View Consistency
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
by: Chen, Hao, et al.
Published: (2023)
by: Chen, Hao, et al.
Published: (2023)
Slight Corruption in Pre-training Data Makes Better Diffusion Models
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
GenFusion: Closing the Loop between Reconstruction and Generation via Videos
by: Wu, Sibo, et al.
Published: (2025)
by: Wu, Sibo, et al.
Published: (2025)
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
by: He, Hao, et al.
Published: (2024)
by: He, Hao, et al.
Published: (2024)
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
by: Li, Zizun, et al.
Published: (2025)
by: Li, Zizun, et al.
Published: (2025)
ReGenNet: Towards Human Action-Reaction Synthesis
by: Xu, Liang, et al.
Published: (2024)
by: Xu, Liang, et al.
Published: (2024)
Memory-V2V: Memory-Augmented Video-to-Video Diffusion for Consistent Multi-Turn Editing
by: Lee, Dohun, et al.
Published: (2026)
by: Lee, Dohun, et al.
Published: (2026)
Distorted or Fabricated? A Survey on Hallucination in Video LLMs
by: Huang, Yiyang, et al.
Published: (2026)
by: Huang, Yiyang, et al.
Published: (2026)
Similar Items
-
Boosting Camera Motion Control for Video Diffusion Transformers
by: Cheong, Soon Yau, et al.
Published: (2024) -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
by: Huang, Zhening, et al.
Published: (2025) -
MASIV: Toward Material-Agnostic System Identification from Videos
by: Zhao, Yizhou, et al.
Published: (2025) -
ActAnywhere: Subject-Aware Video Background Generation
by: Pan, Boxiao, et al.
Published: (2024) -
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
by: Wang, Junying, et al.
Published: (2025)