Saved in:
| Main Authors: | Tai, Yan, Zhu, Luhao, Ding, Yunan, Dong, Yiying, Zhai, Guangtao, Liu, Xiaohong, Guo, Guodong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.07413 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation
by: Cui, Xuehao, et al.
Published: (2024)
by: Cui, Xuehao, et al.
Published: (2024)
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
by: Li, Chunyi, et al.
Published: (2024)
by: Li, Chunyi, et al.
Published: (2024)
Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning
by: Liu, Xiaohong, et al.
Published: (2025)
by: Liu, Xiaohong, et al.
Published: (2025)
Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework
by: Zhao, Wenzhuo, et al.
Published: (2026)
by: Zhao, Wenzhuo, et al.
Published: (2026)
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
by: Tian, Yuan, et al.
Published: (2024)
by: Tian, Yuan, et al.
Published: (2024)
FUMO: Prior-Modulated Diffusion for Single Image Reflection Removal
by: Xu, Telang, et al.
Published: (2026)
by: Xu, Telang, et al.
Published: (2026)
A$^2$-Edit: Precise Reference-Guided Image Editing of Arbitrary Objects and Ambiguous Masks
by: Zheng, Huayu, et al.
Published: (2026)
by: Zheng, Huayu, et al.
Published: (2026)
IllusionBench+: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models
by: Zhang, Yiming, et al.
Published: (2025)
by: Zhang, Yiming, et al.
Published: (2025)
Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes
by: Zhang, Kaiwei, et al.
Published: (2025)
by: Zhang, Kaiwei, et al.
Published: (2025)
Low-Light Image Enhancement via Generative Perceptual Priors
by: Zhou, Han, et al.
Published: (2024)
by: Zhou, Han, et al.
Published: (2024)
DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer
by: Dong, Wei, et al.
Published: (2024)
by: Dong, Wei, et al.
Published: (2024)
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
by: Wu, Guangyang, et al.
Published: (2024)
by: Wu, Guangyang, et al.
Published: (2024)
Omni$^2$: Unifying Omnidirectional Image Generation and Editing in an Omni Model
by: Yang, Liu, et al.
Published: (2025)
by: Yang, Liu, et al.
Published: (2025)
MoGen: A Unified Collaborative Framework for Controllable Multi-Object Image Generation
by: Li, Yanfeng, et al.
Published: (2026)
by: Li, Yanfeng, et al.
Published: (2026)
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
by: Zhou, Han, et al.
Published: (2024)
by: Zhou, Han, et al.
Published: (2024)
ShadowRefiner: Towards Mask-free Shadow Removal via Fast Fourier Transformer
by: Dong, Wei, et al.
Published: (2024)
by: Dong, Wei, et al.
Published: (2024)
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
by: Xu, Runsen, et al.
Published: (2024)
by: Xu, Runsen, et al.
Published: (2024)
When No-Reference Image Quality Models Meet MAP Estimation in Diffusion Latents
by: Zhang, Weixia, et al.
Published: (2024)
by: Zhang, Weixia, et al.
Published: (2024)
How is Visual Attention Influenced by Text Guidance? Database and Model
by: Sun, Yinan, et al.
Published: (2024)
by: Sun, Yinan, et al.
Published: (2024)
Enhancing Test Time Adaptation with Few-shot Guidance
by: Luo, Siqi, et al.
Published: (2024)
by: Luo, Siqi, et al.
Published: (2024)
Towards Open-ended Visual Quality Comparison
by: Wu, Haoning, et al.
Published: (2024)
by: Wu, Haoning, et al.
Published: (2024)
Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation
by: Dong, Wei, et al.
Published: (2025)
by: Dong, Wei, et al.
Published: (2025)
UniProcessor: A Text-induced Unified Low-level Image Processor
by: Duan, Huiyu, et al.
Published: (2024)
by: Duan, Huiyu, et al.
Published: (2024)
Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields
by: Li, Yifei, et al.
Published: (2024)
by: Li, Yifei, et al.
Published: (2024)
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
by: Song, Xiufeng, et al.
Published: (2024)
by: Song, Xiufeng, et al.
Published: (2024)
CogVLM: Visual Expert for Pretrained Language Models
by: Wang, Weihan, et al.
Published: (2023)
by: Wang, Weihan, et al.
Published: (2023)
A2BFR: Attribute-Aware Blind Face Restoration
by: Zhu, Chenxin, et al.
Published: (2026)
by: Zhu, Chenxin, et al.
Published: (2026)
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
by: Zhu, Yuxin, et al.
Published: (2024)
by: Zhu, Yuxin, et al.
Published: (2024)
TokenFLEX: Unified VLM Training for Flexible Visual Tokens Inference
by: Hu, Junshan, et al.
Published: (2025)
by: Hu, Junshan, et al.
Published: (2025)
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
by: Li, Zhang, et al.
Published: (2025)
by: Li, Zhang, et al.
Published: (2025)
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing
by: Wang, Ruiyi, et al.
Published: (2025)
by: Wang, Ruiyi, et al.
Published: (2025)
HazeCLIP: Towards Language Guided Real-World Image Dehazing
by: Wang, Ruiyi, et al.
Published: (2024)
by: Wang, Ruiyi, et al.
Published: (2024)
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model
by: Fu, Kang, et al.
Published: (2025)
by: Fu, Kang, et al.
Published: (2025)
AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image Enhancement
by: Fu, Kang, et al.
Published: (2024)
by: Fu, Kang, et al.
Published: (2024)
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
by: Luo, Siqi, et al.
Published: (2025)
by: Luo, Siqi, et al.
Published: (2025)
Preference-Guided Debiasing for No-Reference Enhancement Image Quality Assessment
by: Gao, Shiqi, et al.
Published: (2026)
by: Gao, Shiqi, et al.
Published: (2026)
LayerT2V: A Unified Multi-Layer Video Generation Framework
by: Li, Guangzhao, et al.
Published: (2025)
by: Li, Guangzhao, et al.
Published: (2025)
PVRF: All-in-one Adverse Weather Removal via Prior-modulated and Velocity-constrained Rectified Flow
by: Dong, Wei, et al.
Published: (2026)
by: Dong, Wei, et al.
Published: (2026)
Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance
by: Zhou, Xunchu, et al.
Published: (2024)
by: Zhou, Xunchu, et al.
Published: (2024)
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
by: Huang, Zhipeng, et al.
Published: (2024)
by: Huang, Zhipeng, et al.
Published: (2024)
Similar Items
-
Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation
by: Cui, Xuehao, et al.
Published: (2024) -
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
by: Li, Chunyi, et al.
Published: (2024) -
Consolidating Diffusion-Generated Video Detection with Unified Multimodal Forgery Learning
by: Liu, Xiaohong, et al.
Published: (2025) -
Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework
by: Zhao, Wenzhuo, et al.
Published: (2026) -
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
by: Tian, Yuan, et al.
Published: (2024)