:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Jia, Gao, Nan, Huang, Huaibo, He, Ran
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.12235
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
by: Gao, Nan, et al.
Published: (2025)

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
by: Gao, Nan, et al.
Published: (2024)

Breaking the Low-Rank Dilemma of Linear Attention
by: Fan, Qihang, et al.
Published: (2024)

LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2024)

Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention
by: Ai, Yuang, et al.
Published: (2025)

Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
by: Sun, Jiayang, et al.
Published: (2025)

Rectifying Magnitude Neglect in Linear Attention
by: Fan, Qihang, et al.
Published: (2025)

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
by: Fan, Qihang, et al.
Published: (2024)

Random Wins All: Rethinking Grouping Strategies for Vision Tokens
by: Fan, Qihang, et al.
Published: (2026)

ZePo: Zero-Shot Portrait Stylization with Faster Sampling
by: Liu, Jin, et al.
Published: (2024)

Lightweight Vision Transformer with Bidirectional Interaction
by: Fan, Qihang, et al.
Published: (2023)

Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
by: Wang, Zi, et al.
Published: (2022)

InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
by: Cui, Xing, et al.
Published: (2023)

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2023)

RMT: Retentive Networks Meet Vision Transformers
by: Fan, Qihang, et al.
Published: (2023)

Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
by: Chen, Mingrui, et al.
Published: (2026)

Advancing Vision Transformer with Enhanced Spatial Priors
by: Fan, Qihang, et al.
Published: (2026)

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
by: Ai, Yuang, et al.
Published: (2023)

Vision Transformer with Super Token Sampling
by: Huang, Huaibo, et al.
Published: (2022)

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
by: Liu, Haogeng, et al.
Published: (2024)

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning
by: Chen, Mingrui, et al.
Published: (2025)

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
by: Ai, Yuang, et al.
Published: (2025)

MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding
by: Bai, Purui, et al.
Published: (2026)

GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
by: Zou, Yueying, et al.
Published: (2026)

Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models
by: Ge, Shiran, et al.
Published: (2025)

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
by: Zou, Yueying, et al.
Published: (2025)

Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching Models
by: Bai, Purui, et al.
Published: (2026)

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
by: Chen, Yongwei, et al.
Published: (2024)

ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation
by: Teng, Qianrui, et al.
Published: (2025)

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
by: Liu, Haogeng, et al.
Published: (2024)

Vision Transformer with Sparse Scan Prior
by: Zhang, Yuguang, et al.
Published: (2024)

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
by: Liu, Xuannan, et al.
Published: (2025)

ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)

IBCapsNet: Information Bottleneck Capsule Network for Noise-Robust Representation Learning
by: Xiang, Canqun, et al.
Published: (2026)

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
by: Liu, Xuannan, et al.
Published: (2024)

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling
by: Zhao, Yueming, et al.
Published: (2024)

NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-Identification
by: Li, Shihao, et al.
Published: (2025)

Graph Information Bottleneck for Remote Sensing Segmentation
by: Shou, Yuntao, et al.
Published: (2023)

Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos
by: Chen, Kaihua, et al.
Published: (2025)

DeVAn: Dense Video Annotation for Video-Language Models
by: Liu, Tingkai, et al.
Published: (2023)