:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Han, Dongchen, Ye, Tianzhu, Xia, Zhuofan, Chen, Kaiyi, Wang, Yulin, Chen, Hanting, Huang, Gao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.14329
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Agent Attention: On the Integration of Softmax and Linear Attention
by: Han, Dongchen, et al.
Published: (2023)

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024)

GSVA: Generalized Segmentation via Multimodal Large Language Models
by: Xia, Zhuofan, et al.
Published: (2023)

Bridging the Divide: Reconsidering Softmax and Linear Attention
by: Han, Dongchen, et al.
Published: (2024)

Demystify Mamba in Vision: A Linear Attention Perspective
by: Han, Dongchen, et al.
Published: (2024)

One Step Diffusion-based Super-Resolution with Time-Aware Distillation
by: He, Xiao, et al.
Published: (2024)

Linear-Time Global Visual Modeling without Explicit Attention
by: He, Ruize, et al.
Published: (2026)

STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft
by: Zhao, Zhonghan, et al.
Published: (2024)

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
by: Pu, Yifan, et al.
Published: (2025)

Vision Transformers are Circulant Attention Learners
by: Han, Dongchen, et al.
Published: (2025)

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification
by: Li, Yulin, et al.
Published: (2024)

One Step Learning, One Step Review
by: Huang, Xiaolong, et al.
Published: (2024)

OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning
by: Han, Zongyan, et al.
Published: (2025)

Denoising Diffusion Step-aware Models
by: Yang, Shuai, et al.
Published: (2023)

Few-Step Distillation for Text-to-Image Generation: A Practical Guide
by: Pu, Yifan, et al.
Published: (2025)

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
by: Huang, Rui, et al.
Published: (2024)

Step-GUI Technical Report
by: Yan, Haolong, et al.
Published: (2025)

Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)

Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages
by: Yue, Zhixiong, et al.
Published: (2026)

Self-Adversarial One Step Generation via Condition Shifting
by: Liu, Deyuan, et al.
Published: (2026)

VividFace: High-Quality and Efficient One-Step Diffusion For Video Face Enhancement
by: Zhang, Shulian, et al.
Published: (2025)

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
by: Zhang, Jiahao, et al.
Published: (2023)

A Step to Decouple Optimization in 3DGS
by: Ding, Renjie, et al.
Published: (2026)

Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation
by: Wang, Chuhan, et al.
Published: (2026)

RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
by: Zhang, Ruoxuan, et al.
Published: (2025)

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
by: Huang, Hai, et al.
Published: (2025)

$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs
by: Wang, Siting, et al.
Published: (2026)

One-Step Diffusion Model for Image Motion-Deblurring
by: Liu, Xiaoyang, et al.
Published: (2025)

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
by: Guo, Jianyuan, et al.
Published: (2024)

One-Step Event-Driven High-Speed Autofocus
by: Bao, Yuhan, et al.
Published: (2025)

StepAL: Step-aware Active Learning for Cataract Surgical Videos
by: Shah, Nisarg A., et al.
Published: (2025)

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
by: Souček, Tomáš, et al.
Published: (2024)

DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution
by: Lv, Zhengyao, et al.
Published: (2026)

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
by: Wang, Wenxuan, et al.
Published: (2024)

MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
by: Xue, Haotian, et al.
Published: (2025)

Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports
by: Yang, Yuchen, et al.
Published: (2026)

FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction
by: Lin, Jiang, et al.
Published: (2025)

OSDFace: One-Step Diffusion Model for Face Restoration
by: Wang, Jingkai, et al.
Published: (2024)

Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments
by: Li, Haoyuan, et al.
Published: (2026)

Omni-Dimensional Frequency Learner for General Time Series Analysis
by: Chen, Xianing, et al.
Published: (2024)