Saved in:
| Main Authors: | Fan, Qihang, Huang, Huaibo, Chen, Mingrui, Liu, Hongmin, He, Ran |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.18549 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RMT: Retentive Networks Meet Vision Transformers
by: Fan, Qihang, et al.
Published: (2023)
by: Fan, Qihang, et al.
Published: (2023)
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
Vision Transformer with Sparse Scan Prior
by: Zhang, Yuguang, et al.
Published: (2024)
by: Zhang, Yuguang, et al.
Published: (2024)
Lightweight Vision Transformer with Bidirectional Interaction
by: Fan, Qihang, et al.
Published: (2023)
by: Fan, Qihang, et al.
Published: (2023)
Random Wins All: Rethinking Grouping Strategies for Vision Tokens
by: Fan, Qihang, et al.
Published: (2026)
by: Fan, Qihang, et al.
Published: (2026)
Breaking the Low-Rank Dilemma of Linear Attention
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning
by: Chen, Mingrui, et al.
Published: (2025)
by: Chen, Mingrui, et al.
Published: (2025)
Rectifying Magnitude Neglect in Linear Attention
by: Fan, Qihang, et al.
Published: (2025)
by: Fan, Qihang, et al.
Published: (2025)
ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention
by: Ai, Yuang, et al.
Published: (2025)
by: Ai, Yuang, et al.
Published: (2025)
Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
by: Chen, Mingrui, et al.
Published: (2026)
by: Chen, Mingrui, et al.
Published: (2026)
Vision Transformer with Super Token Sampling
by: Huang, Huaibo, et al.
Published: (2022)
by: Huang, Huaibo, et al.
Published: (2022)
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
by: Ai, Yuang, et al.
Published: (2025)
by: Ai, Yuang, et al.
Published: (2025)
Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models
by: Ge, Shiran, et al.
Published: (2025)
by: Ge, Shiran, et al.
Published: (2025)
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
by: Han, Xiaotian, et al.
Published: (2024)
by: Han, Xiaotian, et al.
Published: (2024)
LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2024)
by: Ai, Yuang, et al.
Published: (2024)
DeVAn: Dense Video Annotation for Video-Language Models
by: Liu, Tingkai, et al.
Published: (2023)
by: Liu, Tingkai, et al.
Published: (2023)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
by: Liu, Jin, et al.
Published: (2024)
by: Liu, Jin, et al.
Published: (2024)
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
by: Ai, Yuang, et al.
Published: (2023)
by: Ai, Yuang, et al.
Published: (2023)
NOFT: Test-Time Noise Finetune via Information Bottleneck for Highly Correlated Asset Creation
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2023)
by: Ai, Yuang, et al.
Published: (2023)
Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
by: Sun, Jiayang, et al.
Published: (2025)
by: Sun, Jiayang, et al.
Published: (2025)
InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
by: Gao, Nan, et al.
Published: (2025)
by: Gao, Nan, et al.
Published: (2025)
Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
by: Wang, Zi, et al.
Published: (2022)
by: Wang, Zi, et al.
Published: (2022)
MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding
by: Bai, Purui, et al.
Published: (2026)
by: Bai, Purui, et al.
Published: (2026)
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
by: Liu, Haogeng, et al.
Published: (2024)
by: Liu, Haogeng, et al.
Published: (2024)
Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation
by: Liang, Jian, et al.
Published: (2024)
by: Liang, Jian, et al.
Published: (2024)
DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
by: Gao, Nan, et al.
Published: (2024)
by: Gao, Nan, et al.
Published: (2024)
Straighter Flow Matching via a Diffusion-Based Coupling Prior
by: Xing, Siyu, et al.
Published: (2023)
by: Xing, Siyu, et al.
Published: (2023)
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
by: Zou, Yueying, et al.
Published: (2025)
by: Zou, Yueying, et al.
Published: (2025)
ObjVariantEnsemble: Advancing Point Cloud LLM Evaluation in Challenging Scenes with Subtly Distinguished Objects
by: Cao, Qihang, et al.
Published: (2024)
by: Cao, Qihang, et al.
Published: (2024)
Learning Spatial Decay for Vision Transformers
by: Mao, Yuxin, et al.
Published: (2025)
by: Mao, Yuxin, et al.
Published: (2025)
GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
by: Zou, Yueying, et al.
Published: (2026)
by: Zou, Yueying, et al.
Published: (2026)
ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation
by: Teng, Qianrui, et al.
Published: (2025)
by: Teng, Qianrui, et al.
Published: (2025)
ReVision: Refining Video Diffusion with Explicit 3D Motion Modeling
by: Liu, Qihao, et al.
Published: (2025)
by: Liu, Qihao, et al.
Published: (2025)
DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior
by: Li, Mingrui, et al.
Published: (2025)
by: Li, Mingrui, et al.
Published: (2025)
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
by: Liu, Xuannan, et al.
Published: (2025)
by: Liu, Xuannan, et al.
Published: (2025)
Source-Free Domain Adaptive Semantic Segmentation of Remote Sensing Images with Diffusion-Guided Label Enrichment
by: Liu, Wenjie, et al.
Published: (2025)
by: Liu, Wenjie, et al.
Published: (2025)
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
by: Peng, Qihang, et al.
Published: (2025)
by: Peng, Qihang, et al.
Published: (2025)
PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
by: Zhang, Tianhao, et al.
Published: (2024)
by: Zhang, Tianhao, et al.
Published: (2024)
Similar Items
-
RMT: Retentive Networks Meet Vision Transformers
by: Fan, Qihang, et al.
Published: (2023) -
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
by: Fan, Qihang, et al.
Published: (2024) -
Vision Transformer with Sparse Scan Prior
by: Zhang, Yuguang, et al.
Published: (2024) -
Lightweight Vision Transformer with Bidirectional Interaction
by: Fan, Qihang, et al.
Published: (2023) -
Random Wins All: Rethinking Grouping Strategies for Vision Tokens
by: Fan, Qihang, et al.
Published: (2026)