:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Qisen, Zhao, Yifan, Shen, Peisen, Li, Jialu, Li, Jia
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.01481
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
by: Wang, Qisen, et al.
Published: (2026)

How to Use Diffusion Priors under Sparse Views?
by: Wang, Qisen, et al.
Published: (2024)

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
by: Wu, Jay Zhangjie, et al.
Published: (2025)

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
by: Duan, Zheng-Peng, et al.
Published: (2025)

Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models
by: Li, Qifan, et al.
Published: (2026)

DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
by: Gu, Songen, et al.
Published: (2024)

Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion
by: Wei, Shuoyan, et al.
Published: (2026)

HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
by: Zhou, Haiyang, et al.
Published: (2025)

ControlSR: Taming Diffusion Models for Consistent Real-World Image Super Resolution
by: Wan, Yuhao, et al.
Published: (2024)

Gradient-Free Classifier Guidance for Diffusion Model Sampling
by: Shenoy, Rahul, et al.
Published: (2024)

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
by: Song, Ziyang, et al.
Published: (2025)

OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder
by: Gao, Sensen, et al.
Published: (2026)

Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL
by: Wu, Ruitao, et al.
Published: (2025)

DiP: Taming Diffusion Models in Pixel Space
by: Chen, Zhennan, et al.
Published: (2025)

SpikeVAEDiff: Neural Spike-based Natural Visual Scene Reconstruction via VD-VAE and Versatile Diffusion
by: Li, Jialu, et al.
Published: (2026)

UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
by: Shi, Bowen, et al.
Published: (2024)

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On
by: Wang, Jinjuan, et al.
Published: (2025)

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
by: Feng, Yutong, et al.
Published: (2023)

ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
by: Wang, Zikai, et al.
Published: (2026)

Taming Diffusion Models for Image Restoration: A Review
by: Luo, Ziwei, et al.
Published: (2024)

OCCDiff: Occupancy Diffusion Model for High-Fidelity 3D Building Reconstruction from Noisy Point Clouds
by: Sui, Jialu, et al.
Published: (2025)

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
by: Cui, Qinpeng, et al.
Published: (2024)

VSDiffusion: Taming Ill-Posed Shadow Generation via Visibility-Constrained Diffusion
by: Li, Jing, et al.
Published: (2026)

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
by: Zhou, Yang, et al.
Published: (2025)

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
by: Bahmani, Sherwin, et al.
Published: (2024)

Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation
by: Meng, Xuyi, et al.
Published: (2025)

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
by: Zhong, Yingji, et al.
Published: (2025)

Tango: Taming Visual Signals for Efficient Video Large Language Models
by: Yin, Shukang, et al.
Published: (2026)

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
by: Wang, Qinghe, et al.
Published: (2024)

Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control
by: Song, Chenxi, et al.
Published: (2025)

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
by: Jin, Yudong, et al.
Published: (2025)

MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
by: Zou, Yingshuang, et al.
Published: (2025)

Taming Diffusion for Dataset Distillation with High Representativeness
by: Zhao, Lin, et al.
Published: (2025)

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval
by: Li, Zixu, et al.
Published: (2026)

Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement
by: Wu, Ruitao, et al.
Published: (2025)

Parsing Objects at a Finer Granularity: A Survey
by: Zhao, Yifan, et al.
Published: (2022)

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
by: Yu, Wangbo, et al.
Published: (2024)

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning
by: Chen, Chubin, et al.
Published: (2025)

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
by: Luo, Yihong, et al.
Published: (2024)

Free Lunch to Meet the Gap: Intermediate Domain Reconstruction for Cross-Domain Few-Shot Learning
by: Zhang, Tong, et al.
Published: (2025)