Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.00305 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866929693563813888 |
|---|---|
| author | Guo, Jiaxin Chen, C. L. Philip Li, Shuzhen Zhang, Tong |
| author_facet | Guo, Jiaxin Chen, C. L. Philip Li, Shuzhen Zhang, Tong |
| contents | Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (DEUCE) framework for CSAL. Specifically, DEUCE leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. DEUCE performs well in selecting class-balanced and hard representative data by dual-diversity and informativeness. Experiments on six NLP datasets demonstrate the superiority and efficiency of DEUCE. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_00305 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning Guo, Jiaxin Chen, C. L. Philip Li, Shuzhen Zhang, Tong Computation and Language Artificial Intelligence Information Retrieval I.2.6; I.2.7; I.5.1; H.3.1; H.3.3 Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (DEUCE) framework for CSAL. Specifically, DEUCE leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. DEUCE performs well in selecting class-balanced and hard representative data by dual-diversity and informativeness. Experiments on six NLP datasets demonstrate the superiority and efficiency of DEUCE. |
| title | DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning |
| topic | Computation and Language Artificial Intelligence Information Retrieval I.2.6; I.2.7; I.5.1; H.3.1; H.3.3 |
| url | https://arxiv.org/abs/2502.00305 |