Saved in:
Bibliographic Details
Main Authors: Inoshita, Keito, Ueno, Takato
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.24773
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Annotator disagreement in emotion classification reflects ambiguity intrinsic to emotion concepts and is essential for predictor-quality assessment in subjective NLP. Yet no prior work integrates soft-label learning with Bayesian deep learning to evaluate uncertainty along axes including annotator-distribution fidelity. We train a linear head on a frozen RoBERTa via cyclical stochastic gradient Markov chain Monte Carlo (cSG-MCMC), targeting the empirical annotator distribution with a soft-label objective under a five-axis evaluation. On the 28-emotion GoEmotions benchmark, the proposed method outperforms Monte Carlo Dropout and Deep Ensemble simultaneously on three axes -- Jensen-Shannon divergence (JSD) to the annotator distribution, Spearman correlation between per-emotion aleatoric uncertainty and disagreement, and selective-prediction Area Under the Risk-Coverage Curve (AURC) and Area Under the ROC Curve (AUROC) -- showing independent axes are jointly attainable from one posterior. Post-hoc temperature scaling exhibits a bidirectional effect, establishing hard-label calibration and annotator-JSD as independent dimensions and motivating joint reporting as an honest protocol.