Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.15211 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911392342212608 |
|---|---|
| author | Long, Yanan |
| author_facet | Long, Yanan |
| contents | Evaluations of generative AI models often collapse nuanced behaviour into a single number computed for a single decoding configuration. Such point estimates obscure tail risks, demographic disparities, and the existence of multiple near-optimal operating points. We propose a unified framework that embraces multiplicity by modelling the distribution of harmful behaviour across the entire space of decoding knobs and prompts, quantifying risk through tail-focused metrics, and integrating stakeholder preferences. Our technical contributions are threefold: (i) we formalise decoding Rashomon sets, regions of knob space whose risk is near-optimal under given criteria and measure their size and disagreement; (ii) we develop a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned stick-breaking weights to learn multi-modal harm surfaces; and (iii) we introduce an active sampling pipeline that uses Bayesian deep learning surrogates to explore knob space efficiently. Our approach bridges multiplicity theory, Bayesian nonparametrics, and stakeholder-aligned sensitivity analysis, paving the way for trustworthy deployment of generative models. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_15211 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Embracing Ambiguity: Bayesian Nonparametrics and Stakeholder Participation for Ambiguity-Aware Safety Evaluation Long, Yanan Artificial Intelligence Applications Evaluations of generative AI models often collapse nuanced behaviour into a single number computed for a single decoding configuration. Such point estimates obscure tail risks, demographic disparities, and the existence of multiple near-optimal operating points. We propose a unified framework that embraces multiplicity by modelling the distribution of harmful behaviour across the entire space of decoding knobs and prompts, quantifying risk through tail-focused metrics, and integrating stakeholder preferences. Our technical contributions are threefold: (i) we formalise decoding Rashomon sets, regions of knob space whose risk is near-optimal under given criteria and measure their size and disagreement; (ii) we develop a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned stick-breaking weights to learn multi-modal harm surfaces; and (iii) we introduce an active sampling pipeline that uses Bayesian deep learning surrogates to explore knob space efficiently. Our approach bridges multiplicity theory, Bayesian nonparametrics, and stakeholder-aligned sensitivity analysis, paving the way for trustworthy deployment of generative models. |
| title | Embracing Ambiguity: Bayesian Nonparametrics and Stakeholder Participation for Ambiguity-Aware Safety Evaluation |
| topic | Artificial Intelligence Applications |
| url | https://arxiv.org/abs/2504.15211 |