:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Walsh, Cole, Ivan, Rodica
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Computers and Society
Online Access:	https://arxiv.org/abs/2603.25674
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
by: Walsh, Cole, et al.
Published: (2025)

"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most
by: Zhou, Kaitlyn, et al.
Published: (2026)

From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring
by: Casabianca, Jodi M., et al.
Published: (2026)

On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
by: Geng, Mingmeng, et al.
Published: (2025)

Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models
by: Shin, Jisu, et al.
Published: (2024)

Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework
by: Watson, Joe, et al.
Published: (2025)

Content vs. Form: What Drives the Writing Score Gap Across Socioeconomic Backgrounds? A Generated Panel Approach
by: Kunievsky, Nadav, et al.
Published: (2026)

AnthroScore: A Computational Linguistic Measure of Anthropomorphism
by: Cheng, Myra, et al.
Published: (2024)

Whose Journey Matters? Investigating Identity Biases in Large Language Models (LLMs) for Travel Planning Assistance
by: Ren, Ruiping, et al.
Published: (2024)

Place Matters: Comparing LLM Hallucination Rates for Place-Based Legal Queries
by: Curran, Damian, et al.
Published: (2025)

Validity Arguments For Constructed Response Scoring Using Generative Artificial Intelligence Applications
by: Casabianca, Jodi M., et al.
Published: (2025)

In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores
by: Tang, Zeyu, et al.
Published: (2026)

What can large language models do for sustainable food?
by: Thomas, Anna T., et al.
Published: (2025)

What Is The Political Content in LLMs' Pre- and Post-Training Data?
by: Ceron, Tanise, et al.
Published: (2025)

Empathy and the Right to Be an Exception: What LLMs Can and Cannot Do
by: Kidder, William, et al.
Published: (2024)

A Survey on Responsible Generative AI: What to Generate and What Not
by: Gu, Jindong
Published: (2024)

The Good, the Bad and the Constructive: Automatically Measuring Peer Review's Utility for Authors
by: Sadallah, Abdelrahman, et al.
Published: (2025)

Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration
by: Chen, Nuo, et al.
Published: (2025)

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
by: Paruchuri, Akshay, et al.
Published: (2025)

What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge
by: Dutta, Arka, et al.
Published: (2025)

Measuring Political Preferences in AI Systems: An Integrative Approach
by: Rozado, David
Published: (2025)

Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
by: Zugecova, Aneta, et al.
Published: (2024)

What's in a Name? Auditing Large Language Models for Race and Gender Bias
by: Salinas, Alejandro, et al.
Published: (2024)

Why Slop Matters
by: Kommers, Cody, et al.
Published: (2025)

Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations
by: Keeman, Michael, et al.
Published: (2026)

DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models
by: Fu, Jiachen, et al.
Published: (2025)

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)

Using GPT-4 to Augment Unbalanced Data for Automatic Scoring
by: Fang, Luyang, et al.
Published: (2023)

Artificial Intelligence Bias on English Language Learners in Automatic Scoring
by: Guo, Shuchen, et al.
Published: (2025)

SoK: Measuring What Matters for Closed-Loop Security Agents
by: Khurana, Mudita, et al.
Published: (2025)

Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
by: Chung, Yi-Ling, et al.
Published: (2025)

Large Language Model-Based Knowledge Graph System Construction for Sustainable Development Goals: An AI-Based Speculative Design Perspective
by: Lin, Yi-De, et al.
Published: (2025)

From Prompts to Constructs: A Dual-Validity Framework for LLM Research in Psychology
by: Lin, Zhicheng
Published: (2025)

The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges
by: Bulut, Okan, et al.
Published: (2024)

Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
by: Sondej, Filip, et al.
Published: (2025)

Statutory Construction and Interpretation for Artificial Intelligence
by: He, Luxi, et al.
Published: (2025)

Mechanical Enforcement for LLM Governance:Evidence of Governance-Task Decoupling in Financial Decision Systems
by: Rodríguez, José Manuel de la Chica, et al.
Published: (2026)

What Is Actually Being Annotated? Inter-Prompt Reliability as a Measurement Problem in LLM-Based Social Science Labeling
by: Liu, Jingyuan
Published: (2026)

Making Retrieval-Augmented Language Models Robust to Irrelevant Context
by: Yoran, Ori, et al.
Published: (2023)

Pay Attention to What Matters
by: Silva, Pedro Luiz, et al.
Published: (2024)