:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Yizhou, Wang, Tuanfeng Y., Raj, Bhiksha, Xu, Min, Yang, Jimei, Huang, Chun-Hao Paul
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2405.14855
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Boosting Camera Motion Control for Video Diffusion Transformers
by: Cheong, Soon Yau, et al.
Published: (2024)

SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
by: Huang, Zhening, et al.
Published: (2025)

MASIV: Toward Material-Agnostic System Identification from Videos
by: Zhao, Yizhou, et al.
Published: (2025)

ActAnywhere: Subject-Aware Video Background Generation
by: Pan, Boxiao, et al.
Published: (2024)

Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
by: Wang, Junying, et al.
Published: (2025)

Pattern Guided UV Recovery for Realistic Video Garment Texturing
by: Zhan, Youyi, et al.
Published: (2024)

ControlVAR: Exploring Controllable Visual Autoregressive Modeling
by: Li, Xiang, et al.
Published: (2024)

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
by: Fang, Ye, et al.
Published: (2025)

Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting
by: Zhao, Yizhou, et al.
Published: (2025)

OmniCam: Unified Multimodal Video Generation via Camera Control
by: Yang, Xiaoda, et al.
Published: (2025)

Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
by: Qiu, Kai, et al.
Published: (2025)

D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
by: Huang, Yiyang, et al.
Published: (2025)

Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
by: Xu, Xiaohao, et al.
Published: (2025)

TROPHIES: Temporal Reconstruction of Places, Humans, and Cameras from Multi-view Videos
by: Liu, Jinpeng, et al.
Published: (2026)

I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
by: Feng, Wanquan, et al.
Published: (2024)

JOG3R: Towards 3D-Consistent Video Generators
by: Huang, Chun-Hao Paul, et al.
Published: (2025)

A Survey of 3D Reconstruction with Event Cameras
by: Xu, Chuanzhi, et al.
Published: (2025)

VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
by: Waheed, Abdul, et al.
Published: (2025)

VerLM: Explaining Face Verification Using Natural Language
by: Hannan, Syed Abdul, et al.
Published: (2026)

An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning
by: Chen, Hao, et al.
Published: (2022)

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
by: Chen, Hao, et al.
Published: (2023)

SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
by: Wu, Tianshu, et al.
Published: (2026)

Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures
by: Xu, Yuancheng, et al.
Published: (2025)

Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM
by: Weng, Zhenzhen, et al.
Published: (2024)

Echo4DIR: 4D Implicit Heart Reconstruction from 2D Echocardiography Videos
by: Liu, Yanan, et al.
Published: (2026)

PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling
by: Dirik, Alara, et al.
Published: (2025)

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
by: Jeong, Hyeonho, et al.
Published: (2024)

Customizable Perturbation Synthesis for Robust SLAM Benchmarking
by: Xu, Xiaohao, et al.
Published: (2024)

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025)

FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction
by: Cao, Wei, et al.
Published: (2026)

GloTSFormer: Global Video Text Spotting Transformer
by: Wang, Han, et al.
Published: (2024)

CHRIS: Clothed Human Reconstruction with Side View Consistency
by: Liu, Dong, et al.
Published: (2025)

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
by: Chen, Hao, et al.
Published: (2023)

Slight Corruption in Pre-training Data Makes Better Diffusion Models
by: Chen, Hao, et al.
Published: (2024)

GenFusion: Closing the Loop between Reconstruction and Generation via Videos
by: Wu, Sibo, et al.
Published: (2025)

CameraCtrl: Enabling Camera Control for Text-to-Video Generation
by: He, Hao, et al.
Published: (2024)

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
by: Li, Zizun, et al.
Published: (2025)

ReGenNet: Towards Human Action-Reaction Synthesis
by: Xu, Liang, et al.
Published: (2024)

Memory-V2V: Memory-Augmented Video-to-Video Diffusion for Consistent Multi-Turn Editing
by: Lee, Dohun, et al.
Published: (2026)

Distorted or Fabricated? A Survey on Hallucination in Video LLMs
by: Huang, Yiyang, et al.
Published: (2026)