Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.09525 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916885212168192 |
|---|---|
| author | Zhang, Liyun Lian, Zheng Liu, Hong Takebe, Takanori Nakashima, Yuta |
| author_facet | Zhang, Liyun Lian, Zheng Liu, Hong Takebe, Takanori Nakashima, Yuta |
| contents | Multi-annotator learning (MAL) aims to model annotator-specific labeling patterns. However, existing methods face a critical challenge: they simply skip updating annotator-specific model parameters when encountering missing labels, i.e., a common scenario in real-world crowdsourced datasets where each annotator labels only small subsets of samples. This leads to inefficient data utilization and overfitting risks. To this end, we propose a novel similarity-weighted semi-supervised learning framework (SimLabel) that leverages inter-annotator similarities to generate weighted soft labels for missing annotations, enabling the utilization of unannotated samples rather than skipping them entirely. We further introduce a confidence-based iterative refinement mechanism that combines maximum probability with entropy-based uncertainty to prioritize predicted high-quality pseudo-labels to impute missing labels, jointly enhancing similarity estimation and model performance over time. For evaluation, we contribute a new multimodal multi-annotator dataset, AMER2, with high and more variable missing rates, reflecting real-world annotation sparsity and enabling evaluation across different sparsity levels. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_09525 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | SimLabel: Similarity-Weighted Iterative Framework for Multi-annotator Learning with Missing Annotations Zhang, Liyun Lian, Zheng Liu, Hong Takebe, Takanori Nakashima, Yuta Multimedia Artificial Intelligence Multi-annotator learning (MAL) aims to model annotator-specific labeling patterns. However, existing methods face a critical challenge: they simply skip updating annotator-specific model parameters when encountering missing labels, i.e., a common scenario in real-world crowdsourced datasets where each annotator labels only small subsets of samples. This leads to inefficient data utilization and overfitting risks. To this end, we propose a novel similarity-weighted semi-supervised learning framework (SimLabel) that leverages inter-annotator similarities to generate weighted soft labels for missing annotations, enabling the utilization of unannotated samples rather than skipping them entirely. We further introduce a confidence-based iterative refinement mechanism that combines maximum probability with entropy-based uncertainty to prioritize predicted high-quality pseudo-labels to impute missing labels, jointly enhancing similarity estimation and model performance over time. For evaluation, we contribute a new multimodal multi-annotator dataset, AMER2, with high and more variable missing rates, reflecting real-world annotation sparsity and enabling evaluation across different sparsity levels. |
| title | SimLabel: Similarity-Weighted Iterative Framework for Multi-annotator Learning with Missing Annotations |
| topic | Multimedia Artificial Intelligence |
| url | https://arxiv.org/abs/2504.09525 |