Saved in:
| Main Authors: | Lyu, Hengye, Li, Zisu, Hong, Yue, Weng, Yueting, Shi, Jiaxin, Zhang, Hanwang, Liang, Chen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.13509 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation
by: Li, Zisu, et al.
Published: (2025)
by: Li, Zisu, et al.
Published: (2025)
Generative Augmented Reality: Paradigms, Technologies, and Future Applications
by: Liang, Chen, et al.
Published: (2025)
by: Liang, Chen, et al.
Published: (2025)
Real-Time Motion-Controllable Autoregressive Video Diffusion
by: Zhao, Kesen, et al.
Published: (2025)
by: Zhao, Kesen, et al.
Published: (2025)
S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation
by: Zhao, Lin, et al.
Published: (2026)
by: Zhao, Lin, et al.
Published: (2026)
DiT4Edit: Diffusion Transformer for Image Editing
by: Feng, Kunyu, et al.
Published: (2024)
by: Feng, Kunyu, et al.
Published: (2024)
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
by: Gao, Kaifeng, et al.
Published: (2024)
by: Gao, Kaifeng, et al.
Published: (2024)
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
by: Wang, Kai, et al.
Published: (2024)
by: Wang, Kai, et al.
Published: (2024)
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
by: Zheng, Jun, et al.
Published: (2024)
by: Zheng, Jun, et al.
Published: (2024)
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
by: Gao, Kaifeng, et al.
Published: (2024)
by: Gao, Kaifeng, et al.
Published: (2024)
LaVin-DiT: Large Vision Diffusion Transformer
by: Wang, Zhaoqing, et al.
Published: (2024)
by: Wang, Zhaoqing, et al.
Published: (2024)
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
by: Duan, Zheng-Peng, et al.
Published: (2025)
by: Duan, Zheng-Peng, et al.
Published: (2025)
DiVE: DiT-based Video Generation with Enhanced Control
by: Jiang, Junpeng, et al.
Published: (2024)
by: Jiang, Junpeng, et al.
Published: (2024)
DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression
by: Shi, Junqi, et al.
Published: (2026)
by: Shi, Junqi, et al.
Published: (2026)
PTQ4DiT: Post-training Quantization for Diffusion Transformers
by: Wu, Junyi, et al.
Published: (2024)
by: Wu, Junyi, et al.
Published: (2024)
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
by: Yang, Zhuoyi, et al.
Published: (2024)
by: Yang, Zhuoyi, et al.
Published: (2024)
Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025)
by: Sun, Yasheng, et al.
Published: (2025)
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation
by: Krishnamurthy, Bharath, et al.
Published: (2026)
by: Krishnamurthy, Bharath, et al.
Published: (2026)
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
by: Wang, Xiang, et al.
Published: (2025)
by: Wang, Xiang, et al.
Published: (2025)
Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer
by: Shao, Ruizhi, et al.
Published: (2024)
by: Shao, Ruizhi, et al.
Published: (2024)
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation
by: Yang, Lianwei, et al.
Published: (2025)
by: Yang, Lianwei, et al.
Published: (2025)
U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers
by: Tian, Yuchuan, et al.
Published: (2024)
by: Tian, Yuchuan, et al.
Published: (2024)
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
by: Li, Shufan, et al.
Published: (2025)
by: Li, Shufan, et al.
Published: (2025)
Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising
by: Fang, Gongfan, et al.
Published: (2024)
by: Fang, Gongfan, et al.
Published: (2024)
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
by: Chen, Lei, et al.
Published: (2024)
by: Chen, Lei, et al.
Published: (2024)
Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
by: Qi, Tianhao, et al.
Published: (2025)
by: Qi, Tianhao, et al.
Published: (2025)
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
by: Zhu, Rui, et al.
Published: (2024)
by: Zhu, Rui, et al.
Published: (2024)
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
by: Chen, Ruichen, et al.
Published: (2025)
by: Chen, Ruichen, et al.
Published: (2025)
$Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
by: Chen, Pengtao, et al.
Published: (2024)
by: Chen, Pengtao, et al.
Published: (2024)
HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization
by: Liu, Wenxuan, et al.
Published: (2024)
by: Liu, Wenxuan, et al.
Published: (2024)
Unveiling Redundancy in Diffusion Transformers (DiTs): A Systematic Study
by: Sun, Xibo, et al.
Published: (2024)
by: Sun, Xibo, et al.
Published: (2024)
Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution
by: Zhang, Xun, et al.
Published: (2026)
by: Zhang, Xun, et al.
Published: (2026)
DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations
by: Tan, Kailin, et al.
Published: (2026)
by: Tan, Kailin, et al.
Published: (2026)
StreamDiT: Real-Time Streaming Text-to-Video Generation
by: Kodaira, Akio, et al.
Published: (2025)
by: Kodaira, Akio, et al.
Published: (2025)
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
by: Deng, Juncan, et al.
Published: (2024)
by: Deng, Juncan, et al.
Published: (2024)
SMooDi: Stylized Motion Diffusion Model
by: Zhong, Lei, et al.
Published: (2024)
by: Zhong, Lei, et al.
Published: (2024)
ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression
by: Lee, Juneyong, et al.
Published: (2026)
by: Lee, Juneyong, et al.
Published: (2026)
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
by: Li, Zhimin, et al.
Published: (2024)
by: Li, Zhimin, et al.
Published: (2024)
FD-DiT: Frequency Domain-Directed Diffusion Transformer for Low-Dose CT Reconstruction
by: Liu, Qiqing, et al.
Published: (2025)
by: Liu, Qiqing, et al.
Published: (2025)
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
by: Liu, Dongyang, et al.
Published: (2025)
by: Liu, Dongyang, et al.
Published: (2025)
Insert Anything: Image Insertion via In-Context Editing in DiT
by: Song, Wensong, et al.
Published: (2025)
by: Song, Wensong, et al.
Published: (2025)
Similar Items
-
SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation
by: Li, Zisu, et al.
Published: (2025) -
Generative Augmented Reality: Paradigms, Technologies, and Future Applications
by: Liang, Chen, et al.
Published: (2025) -
Real-Time Motion-Controllable Autoregressive Video Diffusion
by: Zhao, Kesen, et al.
Published: (2025) -
S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation
by: Zhao, Lin, et al.
Published: (2026) -
DiT4Edit: Diffusion Transformer for Image Editing
by: Feng, Kunyu, et al.
Published: (2024)