:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fan, Qihang, Huang, Huaibo, Chen, Mingrui, Liu, Hongmin, He, Ran
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.18549
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RMT: Retentive Networks Meet Vision Transformers
by: Fan, Qihang, et al.
Published: (2023)

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
by: Fan, Qihang, et al.
Published: (2024)

Vision Transformer with Sparse Scan Prior
by: Zhang, Yuguang, et al.
Published: (2024)

Lightweight Vision Transformer with Bidirectional Interaction
by: Fan, Qihang, et al.
Published: (2023)

Random Wins All: Rethinking Grouping Strategies for Vision Tokens
by: Fan, Qihang, et al.
Published: (2026)

Breaking the Low-Rank Dilemma of Linear Attention
by: Fan, Qihang, et al.
Published: (2024)

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning
by: Chen, Mingrui, et al.
Published: (2025)

Rectifying Magnitude Neglect in Linear Attention
by: Fan, Qihang, et al.
Published: (2025)

ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)

Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention
by: Ai, Yuang, et al.
Published: (2025)

Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
by: Chen, Mingrui, et al.
Published: (2026)

Vision Transformer with Super Token Sampling
by: Huang, Huaibo, et al.
Published: (2022)

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
by: Ai, Yuang, et al.
Published: (2025)

Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models
by: Ge, Shiran, et al.
Published: (2025)

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
by: Han, Xiaotian, et al.
Published: (2024)

LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2024)

DeVAn: Dense Video Annotation for Video-Language Models
by: Liu, Tingkai, et al.
Published: (2023)

ZePo: Zero-Shot Portrait Stylization with Faster Sampling
by: Liu, Jin, et al.
Published: (2024)

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
by: Ai, Yuang, et al.
Published: (2023)

NOFT: Test-Time Noise Finetune via Information Bottleneck for Highly Correlated Asset Creation
by: Li, Jia, et al.
Published: (2025)

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2023)

Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
by: Sun, Jiayang, et al.
Published: (2025)

InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
by: Gao, Nan, et al.
Published: (2025)

Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
by: Wang, Zi, et al.
Published: (2022)

MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding
by: Bai, Purui, et al.
Published: (2026)

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
by: Liu, Haogeng, et al.
Published: (2024)

Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation
by: Liang, Jian, et al.
Published: (2024)

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
by: Gao, Nan, et al.
Published: (2024)

Straighter Flow Matching via a Diffusion-Based Coupling Prior
by: Xing, Siyu, et al.
Published: (2023)

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
by: Zou, Yueying, et al.
Published: (2025)

ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects
by: Cao, Qihang, et al.
Published: (2024)

Learning Spatial Decay for Vision Transformers
by: Mao, Yuxin, et al.
Published: (2025)

GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
by: Zou, Yueying, et al.
Published: (2026)

ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation
by: Teng, Qianrui, et al.
Published: (2025)

ReVision: Refining Video Diffusion with Explicit 3D Motion Modeling
by: Liu, Qihao, et al.
Published: (2025)

DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior
by: Li, Mingrui, et al.
Published: (2025)

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
by: Liu, Xuannan, et al.
Published: (2025)

Source-Free Domain Adaptive Semantic Segmentation of Remote Sensing Images with Diffusion-Guided Label Enrichment
by: Liu, Wenjie, et al.
Published: (2025)

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
by: Peng, Qihang, et al.
Published: (2025)

PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
by: Zhang, Tianhao, et al.
Published: (2024)