Saved in:
| Main Authors: | Li, Jia, Gao, Nan, Huang, Huaibo, He, Ran |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.12235 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
by: Gao, Nan, et al.
Published: (2025)
by: Gao, Nan, et al.
Published: (2025)
DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
by: Gao, Nan, et al.
Published: (2024)
by: Gao, Nan, et al.
Published: (2024)
Breaking the Low-Rank Dilemma of Linear Attention
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2024)
by: Ai, Yuang, et al.
Published: (2024)
Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention
by: Ai, Yuang, et al.
Published: (2025)
by: Ai, Yuang, et al.
Published: (2025)
Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
by: Sun, Jiayang, et al.
Published: (2025)
by: Sun, Jiayang, et al.
Published: (2025)
Rectifying Magnitude Neglect in Linear Attention
by: Fan, Qihang, et al.
Published: (2025)
by: Fan, Qihang, et al.
Published: (2025)
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
Random Wins All: Rethinking Grouping Strategies for Vision Tokens
by: Fan, Qihang, et al.
Published: (2026)
by: Fan, Qihang, et al.
Published: (2026)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
by: Liu, Jin, et al.
Published: (2024)
by: Liu, Jin, et al.
Published: (2024)
Lightweight Vision Transformer with Bidirectional Interaction
by: Fan, Qihang, et al.
Published: (2023)
by: Fan, Qihang, et al.
Published: (2023)
Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
by: Wang, Zi, et al.
Published: (2022)
by: Wang, Zi, et al.
Published: (2022)
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
by: Cui, Xing, et al.
Published: (2023)
by: Cui, Xing, et al.
Published: (2023)
Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2023)
by: Ai, Yuang, et al.
Published: (2023)
RMT: Retentive Networks Meet Vision Transformers
by: Fan, Qihang, et al.
Published: (2023)
by: Fan, Qihang, et al.
Published: (2023)
Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
by: Chen, Mingrui, et al.
Published: (2026)
by: Chen, Mingrui, et al.
Published: (2026)
Advancing Vision Transformer with Enhanced Spatial Priors
by: Fan, Qihang, et al.
Published: (2026)
by: Fan, Qihang, et al.
Published: (2026)
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
by: Ai, Yuang, et al.
Published: (2023)
by: Ai, Yuang, et al.
Published: (2023)
Vision Transformer with Super Token Sampling
by: Huang, Huaibo, et al.
Published: (2022)
by: Huang, Huaibo, et al.
Published: (2022)
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
by: Liu, Haogeng, et al.
Published: (2024)
by: Liu, Haogeng, et al.
Published: (2024)
Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning
by: Chen, Mingrui, et al.
Published: (2025)
by: Chen, Mingrui, et al.
Published: (2025)
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
by: Ai, Yuang, et al.
Published: (2025)
by: Ai, Yuang, et al.
Published: (2025)
MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding
by: Bai, Purui, et al.
Published: (2026)
by: Bai, Purui, et al.
Published: (2026)
GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
by: Zou, Yueying, et al.
Published: (2026)
by: Zou, Yueying, et al.
Published: (2026)
Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models
by: Ge, Shiran, et al.
Published: (2025)
by: Ge, Shiran, et al.
Published: (2025)
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
by: Zou, Yueying, et al.
Published: (2025)
by: Zou, Yueying, et al.
Published: (2025)
Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching Models
by: Bai, Purui, et al.
Published: (2026)
by: Bai, Purui, et al.
Published: (2026)
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
by: Chen, Yongwei, et al.
Published: (2024)
by: Chen, Yongwei, et al.
Published: (2024)
ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation
by: Teng, Qianrui, et al.
Published: (2025)
by: Teng, Qianrui, et al.
Published: (2025)
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
by: Liu, Haogeng, et al.
Published: (2024)
by: Liu, Haogeng, et al.
Published: (2024)
Vision Transformer with Sparse Scan Prior
by: Zhang, Yuguang, et al.
Published: (2024)
by: Zhang, Yuguang, et al.
Published: (2024)
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
by: Liu, Xuannan, et al.
Published: (2025)
by: Liu, Xuannan, et al.
Published: (2025)
ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
IBCapsNet: Information Bottleneck Capsule Network for Noise-Robust Representation Learning
by: Xiang, Canqun, et al.
Published: (2026)
by: Xiang, Canqun, et al.
Published: (2026)
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
by: Liu, Xuannan, et al.
Published: (2024)
by: Liu, Xuannan, et al.
Published: (2024)
DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling
by: Zhao, Yueming, et al.
Published: (2024)
by: Zhao, Yueming, et al.
Published: (2024)
NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-Identification
by: Li, Shihao, et al.
Published: (2025)
by: Li, Shihao, et al.
Published: (2025)
Graph Information Bottleneck for Remote Sensing Segmentation
by: Shou, Yuntao, et al.
Published: (2023)
by: Shou, Yuntao, et al.
Published: (2023)
Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos
by: Chen, Kaihua, et al.
Published: (2025)
by: Chen, Kaihua, et al.
Published: (2025)
DeVAn: Dense Video Annotation for Video-Language Models
by: Liu, Tingkai, et al.
Published: (2023)
by: Liu, Tingkai, et al.
Published: (2023)
Similar Items
-
InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
by: Gao, Nan, et al.
Published: (2025) -
DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
by: Gao, Nan, et al.
Published: (2024) -
Breaking the Low-Rank Dilemma of Linear Attention
by: Fan, Qihang, et al.
Published: (2024) -
LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration
by: Ai, Yuang, et al.
Published: (2024) -
Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention
by: Ai, Yuang, et al.
Published: (2025)