Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Raab, Reilly, Parker, Mike, Nally, Dan, Montgomery, Sadie, Bernat, Anastasia, Munikoti, Sai, Horawalavithana, Sameera
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2507.08109
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912475821113344
author	Raab, Reilly Parker, Mike Nally, Dan Montgomery, Sadie Bernat, Anastasia Munikoti, Sai Horawalavithana, Sameera
author_facet	Raab, Reilly Parker, Mike Nally, Dan Montgomery, Sadie Bernat, Anastasia Munikoti, Sai Horawalavithana, Sameera
contents	The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allowing human experts to focus on informed decision-making rather than data-processing or prompt engineering? In this work, we propose a framework for declaring statically typed, LM-powered subroutines (i.e., callable, function-like procedures) for use within conventional asynchronous code -- such that sparse feedback from human experts is used to improve the performance of each subroutine online (i.e., during use). In our implementation, all LM-produced artifacts (i.e., prompts, inputs, outputs, and data-dependencies) are recorded and exposed to audit on demand. We package this framework as a library to support its adoption and continued development. While this framework may be applicable across several real-world decision workflows (e.g., in healthcare and legal fields), we evaluate it in the context of public comment processing as mandated by the 1969 National Environmental Protection Act (NEPA): Specifically, we use this framework to develop "CommentNEPA," an application that compiles, organizes, and summarizes a corpus of public commentary submitted in response to a project requiring environmental review. We quantitatively evaluate the application by comparing its outputs (when operating without human feedback) to historical ``ground-truth'' data as labelled by human annotators during the preparation of official environmental impact statements.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_08109
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing Raab, Reilly Parker, Mike Nally, Dan Montgomery, Sadie Bernat, Anastasia Munikoti, Sai Horawalavithana, Sameera Computation and Language The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allowing human experts to focus on informed decision-making rather than data-processing or prompt engineering? In this work, we propose a framework for declaring statically typed, LM-powered subroutines (i.e., callable, function-like procedures) for use within conventional asynchronous code -- such that sparse feedback from human experts is used to improve the performance of each subroutine online (i.e., during use). In our implementation, all LM-produced artifacts (i.e., prompts, inputs, outputs, and data-dependencies) are recorded and exposed to audit on demand. We package this framework as a library to support its adoption and continued development. While this framework may be applicable across several real-world decision workflows (e.g., in healthcare and legal fields), we evaluate it in the context of public comment processing as mandated by the 1969 National Environmental Protection Act (NEPA): Specifically, we use this framework to develop "CommentNEPA," an application that compiles, organizes, and summarizes a corpus of public commentary submitted in response to a project requiring environmental review. We quantitatively evaluate the application by comparing its outputs (when operating without human feedback) to historical ``ground-truth'' data as labelled by human annotators during the preparation of official environmental impact statements.
title	Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing
topic	Computation and Language
url	https://arxiv.org/abs/2507.08109

Similar Items