Saved in:
| Main Authors: | Yang, Jiayan, Wu, Zhuoyu, Fang, Wenqi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.15561 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring
by: Wu, Zhuoyu, et al.
Published: (2025)
by: Wu, Zhuoyu, et al.
Published: (2025)
EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation
by: Wu, Zhuoyu, et al.
Published: (2026)
by: Wu, Zhuoyu, et al.
Published: (2026)
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
by: Zhang, Shilong, et al.
Published: (2023)
by: Zhang, Shilong, et al.
Published: (2023)
PAM-UNet: Shifting Attention on Region of Interest in Medical Images
by: Das, Abhijit, et al.
Published: (2024)
by: Das, Abhijit, et al.
Published: (2024)
MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification
by: Gulluk, Halil Ibrahim, et al.
Published: (2026)
by: Gulluk, Halil Ibrahim, et al.
Published: (2026)
RegionGPT: Towards Region Understanding Vision Language Model
by: Guo, Qiushan, et al.
Published: (2024)
by: Guo, Qiushan, et al.
Published: (2024)
EVLM: An Efficient Vision-Language Model for Visual Understanding
by: Chen, Kaibing, et al.
Published: (2024)
by: Chen, Kaibing, et al.
Published: (2024)
Privacy-Preserving Automated Rosacea Detection Based on Medically Inspired Region of Interest Selection
by: Yang, Chengyu, et al.
Published: (2025)
by: Yang, Chengyu, et al.
Published: (2025)
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
by: Liang, Wenqi, et al.
Published: (2025)
by: Liang, Wenqi, et al.
Published: (2025)
Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
Semantically Grounded QFormer for Efficient Vision Language Understanding
by: Choraria, Moulik, et al.
Published: (2023)
by: Choraria, Moulik, et al.
Published: (2023)
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning
by: Chen, Zewen, et al.
Published: (2024)
by: Chen, Zewen, et al.
Published: (2024)
Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)
by: Lee, Jungbeom, et al.
Published: (2024)
Region of Interest based Medical Image Compression
by: Srivastava, Utkarsh Prakash, et al.
Published: (2025)
by: Srivastava, Utkarsh Prakash, et al.
Published: (2025)
FlexAttention for Efficient High-Resolution Vision-Language Models
by: Li, Junyan, et al.
Published: (2024)
by: Li, Junyan, et al.
Published: (2024)
MedROI: Codec-Agnostic Region of Interest-Centric Compression for Medical Images
by: Kim, Jiwon, et al.
Published: (2026)
by: Kim, Jiwon, et al.
Published: (2026)
Remodeling Semantic Relationships in Vision-Language Fine-Tuning
by: Wu, Xiangyang, et al.
Published: (2025)
by: Wu, Xiangyang, et al.
Published: (2025)
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding
by: Peng, Zelin, et al.
Published: (2025)
by: Peng, Zelin, et al.
Published: (2025)
Understanding Degradation with Vision Language Model
by: Lan, Guanzhou, et al.
Published: (2026)
by: Lan, Guanzhou, et al.
Published: (2026)
RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding
by: Fang, Tianchen, et al.
Published: (2025)
by: Fang, Tianchen, et al.
Published: (2025)
EAGLE: An Efficient Global Attention Lesion Segmentation Model for Hepatic Echinococcosis
by: Chen, Jiayan, et al.
Published: (2025)
by: Chen, Jiayan, et al.
Published: (2025)
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models
by: Chen, Jiawei, et al.
Published: (2024)
by: Chen, Jiawei, et al.
Published: (2024)
Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
by: Chang, Aofei, et al.
Published: (2025)
by: Chang, Aofei, et al.
Published: (2025)
Attention Guided Alignment in Efficient Vision-Language Models
by: Mahajan, Shweta, et al.
Published: (2025)
by: Mahajan, Shweta, et al.
Published: (2025)
Event-Priori-Based Vision-Language Model for Efficient Visual Understanding
by: Qin, Haotong, et al.
Published: (2025)
by: Qin, Haotong, et al.
Published: (2025)
Particle-Based Shape Modeling for Arbitrary Regions-of-Interest
by: Xu, Hong, et al.
Published: (2023)
by: Xu, Hong, et al.
Published: (2023)
Towards Vision-Language-Garment Models for Web Knowledge Garment Understanding and Generation
by: Ackermann, Jan, et al.
Published: (2025)
by: Ackermann, Jan, et al.
Published: (2025)
HiPrune: Hierarchical Attention for Efficient Token Pruning in Vision-Language Models
by: Liu, Jizhihui, et al.
Published: (2025)
by: Liu, Jizhihui, et al.
Published: (2025)
Cross-Modal Attention Guided Unlearning in Vision-Language Models
by: Bhaila, Karuna, et al.
Published: (2025)
by: Bhaila, Karuna, et al.
Published: (2025)
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
by: Nath, Vishwesh, et al.
Published: (2024)
by: Nath, Vishwesh, et al.
Published: (2024)
TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation
by: Xia, Zunhui, et al.
Published: (2025)
by: Xia, Zunhui, et al.
Published: (2025)
Optimizing Vision-Language Consistency via Cross-Layer Regional Attention Alignment
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision
by: Arjmand, Cina, et al.
Published: (2024)
by: Arjmand, Cina, et al.
Published: (2024)
ConFoThinking: Consolidated Focused Attention Driven Thinking for Visual Question Answering
by: Wu, Zhaodong, et al.
Published: (2026)
by: Wu, Zhaodong, et al.
Published: (2026)
Region Attention Transformer for Medical Image Restoration
by: Yang, Zhiwen, et al.
Published: (2024)
by: Yang, Zhiwen, et al.
Published: (2024)
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
by: Wu, Biao, et al.
Published: (2024)
by: Wu, Biao, et al.
Published: (2024)
DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy
by: Wu, Zhuoyu, et al.
Published: (2026)
by: Wu, Zhuoyu, et al.
Published: (2026)
Similar Items
-
RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring
by: Wu, Zhuoyu, et al.
Published: (2025) -
EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation
by: Wu, Zhuoyu, et al.
Published: (2026) -
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
by: Zhang, Shilong, et al.
Published: (2023) -
PAM-UNet: Shifting Attention on Region of Interest in Medical Images
by: Das, Abhijit, et al.
Published: (2024) -
MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification
by: Gulluk, Halil Ibrahim, et al.
Published: (2026)