:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Xian, Liu, Chang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.15661
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EraserDiT: Fast Video Inpainting with Diffusion Transformer Model
by: Liu, Jie, et al.
Published: (2025)

iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer
by: Shen, Zhelun, et al.
Published: (2025)

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile
by: Ding, Hangliang, et al.
Published: (2025)

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers
by: He, Xuanhua, et al.
Published: (2025)

SparseDiT: Token Sparsification for Efficient Diffusion Transformer
by: Chang, Shuning, et al.
Published: (2024)

FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation
by: Le, Minh Khoa, et al.
Published: (2026)

DiTVR: Zero-Shot Diffusion Transformer for Video Restoration
by: Gao, Sicheng, et al.
Published: (2025)

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
by: Zhao, Tianchen, et al.
Published: (2024)

Flow-Guided Diffusion for Video Inpainting
by: Gu, Bohai, et al.
Published: (2023)

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
by: Wang, Kai, et al.
Published: (2024)

Coherent Video Inpainting Using Optical Flow-Guided Efficient Diffusion
by: Gu, Bohai, et al.
Published: (2024)

AVID: Any-Length Video Inpainting with Diffusion Model
by: Zhang, Zhixing, et al.
Published: (2023)

S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation
by: Zhao, Lin, et al.
Published: (2026)

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
by: Jiang, Junpeng, et al.
Published: (2025)

Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
by: Zhang, Kaidong, et al.
Published: (2023)

DiffuEraser: A Diffusion Model for Video Inpainting
by: Li, Xiaowen, et al.
Published: (2025)

Learnable Gated Temporal Shift Module for Deep Video Inpainting
by: Chang, Ya-Liang, et al.
Published: (2019)

LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation
by: Yang, Lianwei, et al.
Published: (2025)

Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer
by: Shao, Ruizhi, et al.
Published: (2024)

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer
by: Lyu, Hengye, et al.
Published: (2026)

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization
by: Liu, Wenxuan, et al.
Published: (2024)

MTV-Inpaint: Multi-Task Long Video Inpainting
by: Yang, Shiyuan, et al.
Published: (2025)

LuxDiT: Lighting Estimation with Video Diffusion Transformer
by: Liang, Ruofan, et al.
Published: (2025)

Semantically Consistent Video Inpainting with Conditional Diffusion Models
by: Green, Dylan, et al.
Published: (2024)

Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
by: Zhang, Shuo, et al.
Published: (2026)

PixelDiT: Pixel Diffusion Transformers for Image Generation
by: Yu, Yongsheng, et al.
Published: (2025)

VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
by: Zheng, Jun, et al.
Published: (2024)

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
by: Ju, Xuan, et al.
Published: (2024)

PTQ4DiT: Post-training Quantization for Diffusion Transformers
by: Wu, Junyi, et al.
Published: (2024)

LaVin-DiT: Large Vision Diffusion Transformer
by: Wang, Zhaoqing, et al.
Published: (2024)

DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation
by: Zhao, Wangbo, et al.
Published: (2025)

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
by: Yang, Zhuoyi, et al.
Published: (2024)

MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting
by: Huang, Jun, et al.
Published: (2025)

NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation
by: Zhang, Menglin, et al.
Published: (2024)

Transformer-based Image and Video Inpainting: Current Challenges and Future Directions
by: Elharrouss, Omar, et al.
Published: (2024)

Towards Online Real-Time Memory-based Video Inpainting Transformers
by: Thiry, Guillaume, et al.
Published: (2024)

Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
by: Zhang, Ying, et al.
Published: (2024)

Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
by: Qi, Tianhao, et al.
Published: (2025)

Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025)

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
by: Teng, Yao, et al.
Published: (2024)