Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.21082 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910158937915392 |
|---|---|
| author | Weers, Alexander Rueckert, Daniel Menten, Martin J. |
| author_facet | Weers, Alexander Rueckert, Daniel Menten, Martin J. |
| contents | Training vision-language models (VLMs) for medical report generation is often hindered by the scarcity of high-quality annotated data. This work evaluates the use of a weighted loss function to improve data efficiency. Compared to standard cross-entropy loss, which treats all token prediction errors equally, the reweighted loss shifts the focus to semantically salient tokens with outsized clinical importance. In experiments on ophthalmological report generation, we show that this simple method improves efficiency across multiple data scales, achieving similar report quality with up to ten times less training data. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2604_21082 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting Weers, Alexander Rueckert, Daniel Menten, Martin J. Computation and Language Machine Learning Training vision-language models (VLMs) for medical report generation is often hindered by the scarcity of high-quality annotated data. This work evaluates the use of a weighted loss function to improve data efficiency. Compared to standard cross-entropy loss, which treats all token prediction errors equally, the reweighted loss shifts the focus to semantically salient tokens with outsized clinical importance. In experiments on ophthalmological report generation, we show that this simple method improves efficiency across multiple data scales, achieving similar report quality with up to ten times less training data. |
| title | Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting |
| topic | Computation and Language Machine Learning |
| url | https://arxiv.org/abs/2604.21082 |