Enregistré dans:
| Auteurs principaux: | Jungo, Michael, Fischer, Andreas |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2509.22283 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models
par: Scius-Bertrand, Anna, et autres
Publié: (2024)
par: Scius-Bertrand, Anna, et autres
Publié: (2024)
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
par: Yu, Wenwen, et autres
Publié: (2025)
par: Yu, Wenwen, et autres
Publié: (2025)
Large Language Models Facilitate Vision Reflection in Image Classification
par: An, Guoyuan, et autres
Publié: (2025)
par: An, Guoyuan, et autres
Publié: (2025)
Unified Reinforcement and Imitation Learning for Vision-Language Models
par: Lee, Byung-Kwan, et autres
Publié: (2025)
par: Lee, Byung-Kwan, et autres
Publié: (2025)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
par: Peng, Wenshuo, et autres
Publié: (2024)
par: Peng, Wenshuo, et autres
Publié: (2024)
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
par: Addepalli, Sravanti, et autres
Publié: (2023)
par: Addepalli, Sravanti, et autres
Publié: (2023)
DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis
par: Vögtlin, Lars, et autres
Publié: (2022)
par: Vögtlin, Lars, et autres
Publié: (2022)
Learning Concept-Driven Logical Rules for Interpretable and Generalizable Medical Image Classification
par: Gao, Yibo, et autres
Publié: (2025)
par: Gao, Yibo, et autres
Publié: (2025)
Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification
par: Gou, Jiaxiang, et autres
Publié: (2024)
par: Gou, Jiaxiang, et autres
Publié: (2024)
Image Classification with Deep Reinforcement Active Learning
par: Jiu, Mingyuan, et autres
Publié: (2024)
par: Jiu, Mingyuan, et autres
Publié: (2024)
TTRV: Test-Time Reinforcement Learning for Vision Language Models
par: Singh, Akshit, et autres
Publié: (2025)
par: Singh, Akshit, et autres
Publié: (2025)
DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification
par: Saifullah, Saifullah, et autres
Publié: (2024)
par: Saifullah, Saifullah, et autres
Publié: (2024)
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification
par: Bakkali, Souhail, et autres
Publié: (2023)
par: Bakkali, Souhail, et autres
Publié: (2023)
Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models
par: Cao, Sihan, et autres
Publié: (2026)
par: Cao, Sihan, et autres
Publié: (2026)
Vision-Language Model Based Multi-Expert Fusion for CT Image Classification
par: Bai, Jianfa, et autres
Publié: (2026)
par: Bai, Jianfa, et autres
Publié: (2026)
Degradation-Aware Image Enhancement via Vision-Language Classification
par: Cai, Jie, et autres
Publié: (2025)
par: Cai, Jie, et autres
Publié: (2025)
Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models
par: Atabuzzaman, Md., et autres
Publié: (2025)
par: Atabuzzaman, Md., et autres
Publié: (2025)
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
par: Zheng, Kaipeng, et autres
Publié: (2023)
par: Zheng, Kaipeng, et autres
Publié: (2023)
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
par: Deng, Huilin, et autres
Publié: (2025)
par: Deng, Huilin, et autres
Publié: (2025)
Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models
par: Wei, Canshi
Publié: (2024)
par: Wei, Canshi
Publié: (2024)
Multi-View Synergistic Learning with Vision-Language Adaption for Low-Resource Biomedical Image Classification
par: Luo, Xiaoliu, et autres
Publié: (2026)
par: Luo, Xiaoliu, et autres
Publié: (2026)
OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models
par: Zhong, Lanfeng, et autres
Publié: (2025)
par: Zhong, Lanfeng, et autres
Publié: (2025)
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
par: Li, Ling, et autres
Publié: (2025)
par: Li, Ling, et autres
Publié: (2025)
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models
par: Meng, Tian, et autres
Publié: (2024)
par: Meng, Tian, et autres
Publié: (2024)
A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model
par: Hu, Panwen, et autres
Publié: (2024)
par: Hu, Panwen, et autres
Publié: (2024)
Reinforcement Learning Friendly Vision-Language Model for Minecraft
par: Jiang, Haobin, et autres
Publié: (2023)
par: Jiang, Haobin, et autres
Publié: (2023)
Intersectional Fairness in Vision-Language Models for Medical Image Disease Classification
par: Zhang, Yupeng, et autres
Publié: (2025)
par: Zhang, Yupeng, et autres
Publié: (2025)
DocVCE: Diffusion-based Visual Counterfactual Explanations for Document Image Classification
par: Saifullah, Saifullah, et autres
Publié: (2025)
par: Saifullah, Saifullah, et autres
Publié: (2025)
MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models
par: Zhang, Yin, et autres
Publié: (2026)
par: Zhang, Yin, et autres
Publié: (2026)
Hierarchical Vision Transformer with Prototypes for Interpretable Medical Image Classification
par: Gallée, Luisa, et autres
Publié: (2025)
par: Gallée, Luisa, et autres
Publié: (2025)
Small Language Model Meets with Reinforced Vision Vocabulary
par: Wei, Haoran, et autres
Publié: (2024)
par: Wei, Haoran, et autres
Publié: (2024)
ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models
par: Li, Jiahao, et autres
Publié: (2025)
par: Li, Jiahao, et autres
Publié: (2025)
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
par: Lai, Yuxiang, et autres
Publié: (2025)
par: Lai, Yuxiang, et autres
Publié: (2025)
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
par: Lee, Seung Hyun, et autres
Publié: (2024)
par: Lee, Seung Hyun, et autres
Publié: (2024)
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
par: Shi, Jiangbo, et autres
Publié: (2025)
par: Shi, Jiangbo, et autres
Publié: (2025)
PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning
par: Liu, Jinlong, et autres
Publié: (2026)
par: Liu, Jinlong, et autres
Publié: (2026)
Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification
par: Mahbod, Amirreza, et autres
Publié: (2025)
par: Mahbod, Amirreza, et autres
Publié: (2025)
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification
par: Zhong, Lanfeng, et autres
Publié: (2024)
par: Zhong, Lanfeng, et autres
Publié: (2024)
ZeroSlide: Is Zero-Shot Classification Adequate for Lifelong Learning in Whole-Slide Image Analysis in the Era of Pathology Vision-Language Foundation Models?
par: Bui, Doanh C., et autres
Publié: (2025)
par: Bui, Doanh C., et autres
Publié: (2025)
Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models
par: Bui, Phuoc-Nguyen, et autres
Publié: (2025)
par: Bui, Phuoc-Nguyen, et autres
Publié: (2025)
Documents similaires
-
Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models
par: Scius-Bertrand, Anna, et autres
Publié: (2024) -
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
par: Yu, Wenwen, et autres
Publié: (2025) -
Large Language Models Facilitate Vision Reflection in Image Classification
par: An, Guoyuan, et autres
Publié: (2025) -
Unified Reinforcement and Imitation Learning for Vision-Language Models
par: Lee, Byung-Kwan, et autres
Publié: (2025) -
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
par: Peng, Wenshuo, et autres
Publié: (2024)