Saved in:
| Main Authors: | Han, Dongchen, Ye, Tianzhu, Xia, Zhuofan, Chen, Kaiyi, Wang, Yulin, Chen, Hanting, Huang, Gao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.14329 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Agent Attention: On the Integration of Softmax and Linear Attention
by: Han, Dongchen, et al.
Published: (2023)
by: Han, Dongchen, et al.
Published: (2023)
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024)
by: Pu, Yifan, et al.
Published: (2024)
GSVA: Generalized Segmentation via Multimodal Large Language Models
by: Xia, Zhuofan, et al.
Published: (2023)
by: Xia, Zhuofan, et al.
Published: (2023)
Bridging the Divide: Reconsidering Softmax and Linear Attention
by: Han, Dongchen, et al.
Published: (2024)
by: Han, Dongchen, et al.
Published: (2024)
Demystify Mamba in Vision: A Linear Attention Perspective
by: Han, Dongchen, et al.
Published: (2024)
by: Han, Dongchen, et al.
Published: (2024)
One Step Diffusion-based Super-Resolution with Time-Aware Distillation
by: He, Xiao, et al.
Published: (2024)
by: He, Xiao, et al.
Published: (2024)
Linear-Time Global Visual Modeling without Explicit Attention
by: He, Ruize, et al.
Published: (2026)
by: He, Ruize, et al.
Published: (2026)
STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft
by: Zhao, Zhonghan, et al.
Published: (2024)
by: Zhao, Zhonghan, et al.
Published: (2024)
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
by: Pu, Yifan, et al.
Published: (2025)
by: Pu, Yifan, et al.
Published: (2025)
Vision Transformers are Circulant Attention Learners
by: Han, Dongchen, et al.
Published: (2025)
by: Han, Dongchen, et al.
Published: (2025)
Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification
by: Li, Yulin, et al.
Published: (2024)
by: Li, Yulin, et al.
Published: (2024)
One Step Learning, One Step Review
by: Huang, Xiaolong, et al.
Published: (2024)
by: Huang, Xiaolong, et al.
Published: (2024)
OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning
by: Han, Zongyan, et al.
Published: (2025)
by: Han, Zongyan, et al.
Published: (2025)
Denoising Diffusion Step-aware Models
by: Yang, Shuai, et al.
Published: (2023)
by: Yang, Shuai, et al.
Published: (2023)
Few-Step Distillation for Text-to-Image Generation: A Practical Guide
by: Pu, Yifan, et al.
Published: (2025)
by: Pu, Yifan, et al.
Published: (2025)
Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
by: Huang, Rui, et al.
Published: (2024)
by: Huang, Rui, et al.
Published: (2024)
Step-GUI Technical Report
by: Yan, Haolong, et al.
Published: (2025)
by: Yan, Haolong, et al.
Published: (2025)
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)
by: Chen, Honghao, et al.
Published: (2025)
Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages
by: Yue, Zhixiong, et al.
Published: (2026)
by: Yue, Zhixiong, et al.
Published: (2026)
Self-Adversarial One Step Generation via Condition Shifting
by: Liu, Deyuan, et al.
Published: (2026)
by: Liu, Deyuan, et al.
Published: (2026)
VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement
by: Zhang, Shulian, et al.
Published: (2025)
by: Zhang, Shulian, et al.
Published: (2025)
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
by: Zhang, Jiahao, et al.
Published: (2023)
by: Zhang, Jiahao, et al.
Published: (2023)
A Step to Decouple Optimization in 3DGS
by: Ding, Renjie, et al.
Published: (2026)
by: Ding, Renjie, et al.
Published: (2026)
Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation
by: Wang, Chuhan, et al.
Published: (2026)
by: Wang, Chuhan, et al.
Published: (2026)
RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
by: Zhang, Ruoxuan, et al.
Published: (2025)
by: Zhang, Ruoxuan, et al.
Published: (2025)
Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
by: Huang, Hai, et al.
Published: (2025)
by: Huang, Hai, et al.
Published: (2025)
$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs
by: Wang, Siting, et al.
Published: (2026)
by: Wang, Siting, et al.
Published: (2026)
One-Step Diffusion Model for Image Motion-Deblurring
by: Liu, Xiaoyang, et al.
Published: (2025)
by: Liu, Xiaoyang, et al.
Published: (2025)
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
by: Guo, Jianyuan, et al.
Published: (2024)
by: Guo, Jianyuan, et al.
Published: (2024)
One-Step Event-Driven High-Speed Autofocus
by: Bao, Yuhan, et al.
Published: (2025)
by: Bao, Yuhan, et al.
Published: (2025)
StepAL: Step-aware Active Learning for Cataract Surgical Videos
by: Shah, Nisarg A., et al.
Published: (2025)
by: Shah, Nisarg A., et al.
Published: (2025)
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
by: Souček, Tomáš, et al.
Published: (2024)
by: Souček, Tomáš, et al.
Published: (2024)
DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution
by: Lv, Zhengyao, et al.
Published: (2026)
by: Lv, Zhengyao, et al.
Published: (2026)
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
by: Xue, Haotian, et al.
Published: (2025)
by: Xue, Haotian, et al.
Published: (2025)
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
by: Yang, Yuchen, et al.
Published: (2026)
by: Yang, Yuchen, et al.
Published: (2026)
FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction
by: Lin, Jiang, et al.
Published: (2025)
by: Lin, Jiang, et al.
Published: (2025)
OSDFace: One-Step Diffusion Model for Face Restoration
by: Wang, Jingkai, et al.
Published: (2024)
by: Wang, Jingkai, et al.
Published: (2024)
Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments
by: Li, Haoyuan, et al.
Published: (2026)
by: Li, Haoyuan, et al.
Published: (2026)
Omni-Dimensional Frequency Learner for General Time Series Analysis
by: Chen, Xianing, et al.
Published: (2024)
by: Chen, Xianing, et al.
Published: (2024)
Similar Items
-
Agent Attention: On the Integration of Softmax and Linear Attention
by: Han, Dongchen, et al.
Published: (2023) -
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024) -
GSVA: Generalized Segmentation via Multimodal Large Language Models
by: Xia, Zhuofan, et al.
Published: (2023) -
Bridging the Divide: Reconsidering Softmax and Linear Attention
by: Han, Dongchen, et al.
Published: (2024) -
Demystify Mamba in Vision: A Linear Attention Perspective
by: Han, Dongchen, et al.
Published: (2024)