:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Jiayan, Wu, Zhuoyu, Fang, Wenqi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.15561
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring
by: Wu, Zhuoyu, et al.
Published: (2025)

EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation
by: Wu, Zhuoyu, et al.
Published: (2026)

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
by: Zhang, Shilong, et al.
Published: (2023)

PAM-UNet: Shifting Attention on Region of Interest in Medical Images
by: Das, Abhijit, et al.
Published: (2024)

MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification
by: Gulluk, Halil Ibrahim, et al.
Published: (2026)

RegionGPT: Towards Region Understanding Vision Language Model
by: Guo, Qiushan, et al.
Published: (2024)

EVLM: An Efficient Vision-Language Model for Visual Understanding
by: Chen, Kaibing, et al.
Published: (2024)

Privacy-Preserving Automated Rosacea Detection Based on Medically Inspired Region of Interest Selection
by: Yang, Chengyu, et al.
Published: (2025)

PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
by: Jiang, Songtao, et al.
Published: (2025)

Semantically Grounded QFormer for Efficient Vision Language Understanding
by: Choraria, Moulik, et al.
Published: (2023)

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning
by: Chen, Zewen, et al.
Published: (2024)

Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)

Region of Interest based Medical Image Compression
by: Srivastava, Utkarsh Prakash, et al.
Published: (2025)

FlexAttention for Efficient High-Resolution Vision-Language Models
by: Li, Junyan, et al.
Published: (2024)

MedROI: Codec-Agnostic Region of Interest-Centric Compression for Medical Images
by: Kim, Jiwon, et al.
Published: (2026)

Remodeling Semantic Relationships in Vision-Language Fine-Tuning
by: Wu, Xiangyang, et al.
Published: (2025)

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
by: Meng, Fanqing, et al.
Published: (2024)

NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding
by: Peng, Zelin, et al.
Published: (2025)

Understanding Degradation with Vision Language Model
by: Lan, Guanzhou, et al.
Published: (2026)

RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding
by: Fang, Tianchen, et al.
Published: (2025)

EAGLE: An Efficient Global Attention Lesion Segmentation Model for Hepatic Echinococcosis
by: Chen, Jiayan, et al.
Published: (2025)

Detecting and Evaluating Medical Hallucinations in Large Vision Language Models
by: Chen, Jiawei, et al.
Published: (2024)

Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
by: Chang, Aofei, et al.
Published: (2025)

Attention Guided Alignment in Efficient Vision-Language Models
by: Mahajan, Shweta, et al.
Published: (2025)

Event-Priori-Based Vision-Language Model for Efficient Visual Understanding
by: Qin, Haotong, et al.
Published: (2025)

Particle-Based Shape Modeling for Arbitrary Regions-of-Interest
by: Xu, Hong, et al.
Published: (2023)

Towards Vision-Language-Garment Models for Web Knowledge Garment Understanding and Generation
by: Ackermann, Jan, et al.
Published: (2025)

HiPrune: Hierarchical Attention for Efficient Token Pruning in Vision-Language Models
by: Liu, Jizhihui, et al.
Published: (2025)

Cross-Modal Attention Guided Unlearning in Vision-Language Models
by: Bhaila, Karuna, et al.
Published: (2025)

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
by: Nath, Vishwesh, et al.
Published: (2024)

TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation
by: Xia, Zunhui, et al.
Published: (2025)

Optimizing Vision-Language Consistency via Cross-Layer Regional Attention Alignment
by: Wang, Yifan, et al.
Published: (2025)

HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)

TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision
by: Arjmand, Cina, et al.
Published: (2024)

ConFoThinking: Consolidated Focused Attention Driven Thinking for Visual Question Answering
by: Wu, Zhaodong, et al.
Published: (2026)

Region Attention Transformer for Medical Image Restoration
by: Yang, Zhiwen, et al.
Published: (2024)

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024)

MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
by: Wu, Biao, et al.
Published: (2024)

DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy
by: Wu, Zhuoyu, et al.
Published: (2026)