Saved in:
| Main Authors: | Lin, Wang, Jia, Liyu, Hu, Wentao, Pan, Kaihang, Yue, Zhongqi, Zhao, Wei, Chen, Jingyuan, Wu, Fei, Zhang, Hanwang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.15932 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
by: Pan, Kaihang, et al.
Published: (2025)
by: Pan, Kaihang, et al.
Published: (2025)
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
by: Wang, Bohan, et al.
Published: (2025)
by: Wang, Bohan, et al.
Published: (2025)
Few-shot Learner Parameterization by Diffusion Time-steps
by: Yue, Zhongqi, et al.
Published: (2024)
by: Yue, Zhongqi, et al.
Published: (2024)
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
by: Yu, Qifan, et al.
Published: (2024)
by: Yu, Qifan, et al.
Published: (2024)
Exploring Diffusion Time-steps for Unsupervised Representation Learning
by: Yue, Zhongqi, et al.
Published: (2024)
by: Yue, Zhongqi, et al.
Published: (2024)
Auto-Encoding Morph-Tokens for Multimodal LLM
by: Pan, Kaihang, et al.
Published: (2024)
by: Pan, Kaihang, et al.
Published: (2024)
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
by: You, Haoran, et al.
Published: (2024)
by: You, Haoran, et al.
Published: (2024)
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
by: Fei, Hao, et al.
Published: (2023)
by: Fei, Hao, et al.
Published: (2023)
Non-confusing Generation of Customized Concepts in Diffusion Models
by: Lin, Wang, et al.
Published: (2024)
by: Lin, Wang, et al.
Published: (2024)
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
by: Liu, Feng, et al.
Published: (2024)
by: Liu, Feng, et al.
Published: (2024)
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
by: Fei, Hao, et al.
Published: (2024)
by: Fei, Hao, et al.
Published: (2024)
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
by: Pan, Kaihang, et al.
Published: (2025)
by: Pan, Kaihang, et al.
Published: (2025)
Unified Generative and Discriminative Training for Multi-modal Large Language Models
by: Chow, Wei, et al.
Published: (2024)
by: Chow, Wei, et al.
Published: (2024)
Characterizing Motion Encoding in Video Diffusion Timesteps
by: Baherwani, Vatsal, et al.
Published: (2025)
by: Baherwani, Vatsal, et al.
Published: (2025)
Thinking with Images as Continuous Actions: Numerical Visual Chain-of-Thought
by: Zhao, Kesen, et al.
Published: (2026)
by: Zhao, Kesen, et al.
Published: (2026)
ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule
by: Huang, Yilie, et al.
Published: (2026)
by: Huang, Yilie, et al.
Published: (2026)
Video-KTR: Reinforcing Video Reasoning via Key Token Attribution
by: Wang, Ziyue, et al.
Published: (2026)
by: Wang, Ziyue, et al.
Published: (2026)
Don't Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation
by: Kim, Woojin, et al.
Published: (2025)
by: Kim, Woojin, et al.
Published: (2025)
Pusa V1.0: Unlocking Temporal Control in Pretrained Video Diffusion Models via Vectorized Timestep Adaptation
by: Liu, Yaofang, et al.
Published: (2025)
by: Liu, Yaofang, et al.
Published: (2025)
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning
by: Pan, Kaihang, et al.
Published: (2026)
by: Pan, Kaihang, et al.
Published: (2026)
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
by: Liu, Yaofang, et al.
Published: (2024)
by: Liu, Yaofang, et al.
Published: (2024)
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
by: Peng, Xurui, et al.
Published: (2025)
by: Peng, Xurui, et al.
Published: (2025)
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
by: Pan, Kaihang, et al.
Published: (2024)
by: Pan, Kaihang, et al.
Published: (2024)
TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
by: Lin, Qinwei, et al.
Published: (2024)
by: Lin, Qinwei, et al.
Published: (2024)
Timestep-Aware Correction for Quantized Diffusion Models
by: Yao, Yuzhe, et al.
Published: (2024)
by: Yao, Yuzhe, et al.
Published: (2024)
Timestep-Aware Diffusion Model for Extreme Image Rescaling
by: Wang, Ce, et al.
Published: (2024)
by: Wang, Ce, et al.
Published: (2024)
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
by: Yang, Ningyuan, et al.
Published: (2025)
by: Yang, Ningyuan, et al.
Published: (2025)
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
by: Zhao, Kesen, et al.
Published: (2025)
by: Zhao, Kesen, et al.
Published: (2025)
Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
by: Chen, Sirui, et al.
Published: (2026)
by: Chen, Sirui, et al.
Published: (2026)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
by: Liu, Xiaoqian, et al.
Published: (2025)
by: Liu, Xiaoqian, et al.
Published: (2025)
Efficient Matrix Implementation for Rotary Position Embedding
by: Minqi, Chen, et al.
Published: (2026)
by: Minqi, Chen, et al.
Published: (2026)
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation
by: Chow, Wei, et al.
Published: (2025)
by: Chow, Wei, et al.
Published: (2025)
Manifold-Aware Exploration for Reinforcement Learning in Video Generation
by: Zheng, Mingzhe, et al.
Published: (2026)
by: Zheng, Mingzhe, et al.
Published: (2026)
Diffusion Time-step Curriculum for One Image to 3D Generation
by: Yi, Xuanyu, et al.
Published: (2024)
by: Yi, Xuanyu, et al.
Published: (2024)
DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation
by: Zhao, Wangbo, et al.
Published: (2025)
by: Zhao, Wangbo, et al.
Published: (2025)
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
by: Wang, Wentao, et al.
Published: (2025)
by: Wang, Wentao, et al.
Published: (2025)
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
by: Gao, Kaifeng, et al.
Published: (2024)
by: Gao, Kaifeng, et al.
Published: (2024)
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
by: Jiang, Jingjing, et al.
Published: (2025)
by: Jiang, Jingjing, et al.
Published: (2025)
Towards Semantic Equivalence of Tokenization in Multimodal LLM
by: Wu, Shengqiong, et al.
Published: (2024)
by: Wu, Shengqiong, et al.
Published: (2024)
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
by: Li, Zheng, et al.
Published: (2025)
by: Li, Zheng, et al.
Published: (2025)
Similar Items
-
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
by: Pan, Kaihang, et al.
Published: (2025) -
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
by: Wang, Bohan, et al.
Published: (2025) -
Few-shot Learner Parameterization by Diffusion Time-steps
by: Yue, Zhongqi, et al.
Published: (2024) -
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
by: Yu, Qifan, et al.
Published: (2024) -
Exploring Diffusion Time-steps for Unsupervised Representation Learning
by: Yue, Zhongqi, et al.
Published: (2024)