Sparad:
| Huvudupphovsmän: | Fang, Xueji, Ma, Liyuan, Zeng, Jianhao, Cao, Jinjin, Zhou, Mingyuan, Qi, Guo-Jun |
|---|---|
| Materialtyp: | Preprint |
| Publicerad: |
2026
|
| Ämnen: | |
| Länkar: | https://arxiv.org/abs/2606.02090 |
| Taggar: |
Lägg till en tagg
Inga taggar, Lägg till första taggen!
|
- Beståndsuppgifter
- Beskrivning
- Innehållsförteckning
- Kommentarer
- Liknande verk
- Katalogiseringsuppgifter
Liknande verk
Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization
av: Ma, Liyuan, et al.
Publicerad: (2026)
av: Ma, Liyuan, et al.
Publicerad: (2026)
InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO
av: Fang, Xueji, et al.
Publicerad: (2025)
av: Fang, Xueji, et al.
Publicerad: (2025)
Traj2Action: A Co-Denoising Framework for Trajectory-Guided Human-to-Robot Skill Transfer
av: Zhou, Han, et al.
Publicerad: (2025)
av: Zhou, Han, et al.
Publicerad: (2025)
Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
av: Qi, Tianhao, et al.
Publicerad: (2025)
av: Qi, Tianhao, et al.
Publicerad: (2025)
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
av: Zeng, Chong, et al.
Publicerad: (2024)
av: Zeng, Chong, et al.
Publicerad: (2024)
When Images Speak Louder: Mitigating Language Bias-induced Hallucinations in VLMs through Cross-Modal Guidance
av: Cao, Jinjin, et al.
Publicerad: (2025)
av: Cao, Jinjin, et al.
Publicerad: (2025)
DiT4Edit: Diffusion Transformer for Image Editing
av: Feng, Kunyu, et al.
Publicerad: (2024)
av: Feng, Kunyu, et al.
Publicerad: (2024)
MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation
av: Zhang, Guohui, et al.
Publicerad: (2025)
av: Zhang, Guohui, et al.
Publicerad: (2025)
Self-Guidance: Boosting Flow and Diffusion Generation on Their Own
av: Li, Tiancheng, et al.
Publicerad: (2024)
av: Li, Tiancheng, et al.
Publicerad: (2024)
Accelerating Masked Image Generation by Learning Latent Controlled Dynamics
av: Zhu, Kaiwen, et al.
Publicerad: (2026)
av: Zhu, Kaiwen, et al.
Publicerad: (2026)
ResDiT: Evoking the Intrinsic Resolution Scalability in Diffusion Transformers
av: Ma, Yiyang, et al.
Publicerad: (2025)
av: Ma, Yiyang, et al.
Publicerad: (2025)
EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models
av: Wang, Kun, et al.
Publicerad: (2025)
av: Wang, Kun, et al.
Publicerad: (2025)
PixelDiT: Pixel Diffusion Transformers for Image Generation
av: Yu, Yongsheng, et al.
Publicerad: (2025)
av: Yu, Yongsheng, et al.
Publicerad: (2025)
Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising
av: Fang, Gongfan, et al.
Publicerad: (2024)
av: Fang, Gongfan, et al.
Publicerad: (2024)
MaDiS: Taming Masked Diffusion Language Models for Sign Language Generation
av: Zuo, Ronglai, et al.
Publicerad: (2026)
av: Zuo, Ronglai, et al.
Publicerad: (2026)
DiMo: Discrete Diffusion Modeling for Motion Generation and Understanding
av: Zhang, Ning, et al.
Publicerad: (2026)
av: Zhang, Ning, et al.
Publicerad: (2026)
MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations
av: Guo, Changlu, et al.
Publicerad: (2026)
av: Guo, Changlu, et al.
Publicerad: (2026)
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
av: Zhao, Tianchen, et al.
Publicerad: (2024)
av: Zhao, Tianchen, et al.
Publicerad: (2024)
DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition
av: Kumar, Raja, et al.
Publicerad: (2025)
av: Kumar, Raja, et al.
Publicerad: (2025)
GenMask: Adapting DiT for Segmentation via Direct Mask Generation
av: Yang, Yuhuan, et al.
Publicerad: (2026)
av: Yang, Yuhuan, et al.
Publicerad: (2026)
DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression
av: Shi, Junqi, et al.
Publicerad: (2026)
av: Shi, Junqi, et al.
Publicerad: (2026)
TerDiT: Ternary Diffusion Models with Transformers
av: Lu, Xudong, et al.
Publicerad: (2024)
av: Lu, Xudong, et al.
Publicerad: (2024)
DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving
av: Tu, Jiahang, et al.
Publicerad: (2024)
av: Tu, Jiahang, et al.
Publicerad: (2024)
FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes
av: Pan, Ziying, et al.
Publicerad: (2024)
av: Pan, Ziying, et al.
Publicerad: (2024)
EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation
av: Wei, Tianyu, et al.
Publicerad: (2024)
av: Wei, Tianyu, et al.
Publicerad: (2024)
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
av: Pak, Byeonghyun, et al.
Publicerad: (2024)
av: Pak, Byeonghyun, et al.
Publicerad: (2024)
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
av: Ma, Zehong, et al.
Publicerad: (2025)
av: Ma, Zehong, et al.
Publicerad: (2025)
EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation
av: Kodavanti, Sravanth, et al.
Publicerad: (2026)
av: Kodavanti, Sravanth, et al.
Publicerad: (2026)
SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM
av: Nie, Ming, et al.
Publicerad: (2026)
av: Nie, Ming, et al.
Publicerad: (2026)
MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation
av: Chen, Wenchao, et al.
Publicerad: (2024)
av: Chen, Wenchao, et al.
Publicerad: (2024)
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
av: Gao, Shanghua, et al.
Publicerad: (2023)
av: Gao, Shanghua, et al.
Publicerad: (2023)
Recursive Generalization Transformer for Image Super-Resolution
av: Chen, Zheng, et al.
Publicerad: (2023)
av: Chen, Zheng, et al.
Publicerad: (2023)
Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
av: Zhang, Hao, et al.
Publicerad: (2024)
av: Zhang, Hao, et al.
Publicerad: (2024)
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
av: Li, Zhimin, et al.
Publicerad: (2024)
av: Li, Zhimin, et al.
Publicerad: (2024)
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
av: Yang, Shentao, et al.
Publicerad: (2024)
av: Yang, Shentao, et al.
Publicerad: (2024)
U-StyDiT: Ultra-high Quality Artistic Style Transfer Using Diffusion Transformers
av: Zhang, Zhanjie, et al.
Publicerad: (2025)
av: Zhang, Zhanjie, et al.
Publicerad: (2025)
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
av: Duan, Zheng-Peng, et al.
Publicerad: (2025)
av: Duan, Zheng-Peng, et al.
Publicerad: (2025)
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
av: Ye, Zilyu, et al.
Publicerad: (2024)
av: Ye, Zilyu, et al.
Publicerad: (2024)
HyperDiT: Hyper-Connected Transformers for High-Fidelity Pixel-Space Diffusion
av: He, Yu, et al.
Publicerad: (2026)
av: He, Yu, et al.
Publicerad: (2026)
Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers
av: Chen, Ruidong, et al.
Publicerad: (2026)
av: Chen, Ruidong, et al.
Publicerad: (2026)
Liknande verk
-
Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization
av: Ma, Liyuan, et al.
Publicerad: (2026) -
InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO
av: Fang, Xueji, et al.
Publicerad: (2025) -
Traj2Action: A Co-Denoising Framework for Trajectory-Guided Human-to-Robot Skill Transfer
av: Zhou, Han, et al.
Publicerad: (2025) -
Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
av: Qi, Tianhao, et al.
Publicerad: (2025) -
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
av: Zeng, Chong, et al.
Publicerad: (2024)