Saved in:
| Main Authors: | Walsh, Cole, Ivan, Rodica |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.25674 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
by: Walsh, Cole, et al.
Published: (2025)
by: Walsh, Cole, et al.
Published: (2025)
"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most
by: Zhou, Kaitlyn, et al.
Published: (2026)
by: Zhou, Kaitlyn, et al.
Published: (2026)
From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring
by: Casabianca, Jodi M., et al.
Published: (2026)
by: Casabianca, Jodi M., et al.
Published: (2026)
On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
by: Geng, Mingmeng, et al.
Published: (2025)
by: Geng, Mingmeng, et al.
Published: (2025)
Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models
by: Shin, Jisu, et al.
Published: (2024)
by: Shin, Jisu, et al.
Published: (2024)
Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework
by: Watson, Joe, et al.
Published: (2025)
by: Watson, Joe, et al.
Published: (2025)
Content vs. Form: What Drives the Writing Score Gap Across Socioeconomic Backgrounds? A Generated Panel Approach
by: Kunievsky, Nadav, et al.
Published: (2026)
by: Kunievsky, Nadav, et al.
Published: (2026)
AnthroScore: A Computational Linguistic Measure of Anthropomorphism
by: Cheng, Myra, et al.
Published: (2024)
by: Cheng, Myra, et al.
Published: (2024)
Whose Journey Matters? Investigating Identity Biases in Large Language Models (LLMs) for Travel Planning Assistance
by: Ren, Ruiping, et al.
Published: (2024)
by: Ren, Ruiping, et al.
Published: (2024)
Place Matters: Comparing LLM Hallucination Rates for Place-Based Legal Queries
by: Curran, Damian, et al.
Published: (2025)
by: Curran, Damian, et al.
Published: (2025)
Validity Arguments For Constructed Response Scoring Using Generative Artificial Intelligence Applications
by: Casabianca, Jodi M., et al.
Published: (2025)
by: Casabianca, Jodi M., et al.
Published: (2025)
In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores
by: Tang, Zeyu, et al.
Published: (2026)
by: Tang, Zeyu, et al.
Published: (2026)
What can large language models do for sustainable food?
by: Thomas, Anna T., et al.
Published: (2025)
by: Thomas, Anna T., et al.
Published: (2025)
What Is The Political Content in LLMs' Pre- and Post-Training Data?
by: Ceron, Tanise, et al.
Published: (2025)
by: Ceron, Tanise, et al.
Published: (2025)
Empathy and the Right to Be an Exception: What LLMs Can and Cannot Do
by: Kidder, William, et al.
Published: (2024)
by: Kidder, William, et al.
Published: (2024)
A Survey on Responsible Generative AI: What to Generate and What Not
by: Gu, Jindong
Published: (2024)
by: Gu, Jindong
Published: (2024)
The Good, the Bad and the Constructive: Automatically Measuring Peer Review's Utility for Authors
by: Sadallah, Abdelrahman, et al.
Published: (2025)
by: Sadallah, Abdelrahman, et al.
Published: (2025)
Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration
by: Chen, Nuo, et al.
Published: (2025)
by: Chen, Nuo, et al.
Published: (2025)
"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
by: Paruchuri, Akshay, et al.
Published: (2025)
by: Paruchuri, Akshay, et al.
Published: (2025)
What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge
by: Dutta, Arka, et al.
Published: (2025)
by: Dutta, Arka, et al.
Published: (2025)
Measuring Political Preferences in AI Systems: An Integrative Approach
by: Rozado, David
Published: (2025)
by: Rozado, David
Published: (2025)
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
by: Zugecova, Aneta, et al.
Published: (2024)
by: Zugecova, Aneta, et al.
Published: (2024)
What's in a Name? Auditing Large Language Models for Race and Gender Bias
by: Salinas, Alejandro, et al.
Published: (2024)
by: Salinas, Alejandro, et al.
Published: (2024)
Why Slop Matters
by: Kommers, Cody, et al.
Published: (2025)
by: Kommers, Cody, et al.
Published: (2025)
Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations
by: Keeman, Michael, et al.
Published: (2026)
by: Keeman, Michael, et al.
Published: (2026)
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models
by: Fu, Jiachen, et al.
Published: (2025)
by: Fu, Jiachen, et al.
Published: (2025)
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)
by: An, Heajun, et al.
Published: (2026)
Using GPT-4 to Augment Unbalanced Data for Automatic Scoring
by: Fang, Luyang, et al.
Published: (2023)
by: Fang, Luyang, et al.
Published: (2023)
Artificial Intelligence Bias on English Language Learners in Automatic Scoring
by: Guo, Shuchen, et al.
Published: (2025)
by: Guo, Shuchen, et al.
Published: (2025)
SoK: Measuring What Matters for Closed-Loop Security Agents
by: Khurana, Mudita, et al.
Published: (2025)
by: Khurana, Mudita, et al.
Published: (2025)
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
by: Chung, Yi-Ling, et al.
Published: (2025)
by: Chung, Yi-Ling, et al.
Published: (2025)
Large Language Model-Based Knowledge Graph System Construction for Sustainable Development Goals: An AI-Based Speculative Design Perspective
by: Lin, Yi-De, et al.
Published: (2025)
by: Lin, Yi-De, et al.
Published: (2025)
From Prompts to Constructs: A Dual-Validity Framework for LLM Research in Psychology
by: Lin, Zhicheng
Published: (2025)
by: Lin, Zhicheng
Published: (2025)
The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges
by: Bulut, Okan, et al.
Published: (2024)
by: Bulut, Okan, et al.
Published: (2024)
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
by: Sondej, Filip, et al.
Published: (2025)
by: Sondej, Filip, et al.
Published: (2025)
Statutory Construction and Interpretation for Artificial Intelligence
by: He, Luxi, et al.
Published: (2025)
by: He, Luxi, et al.
Published: (2025)
Mechanical Enforcement for LLM Governance:Evidence of Governance-Task Decoupling in Financial Decision Systems
by: Rodríguez, José Manuel de la Chica, et al.
Published: (2026)
by: Rodríguez, José Manuel de la Chica, et al.
Published: (2026)
What Is Actually Being Annotated? Inter-Prompt Reliability as a Measurement Problem in LLM-Based Social Science Labeling
by: Liu, Jingyuan
Published: (2026)
by: Liu, Jingyuan
Published: (2026)
Making Retrieval-Augmented Language Models Robust to Irrelevant Context
by: Yoran, Ori, et al.
Published: (2023)
by: Yoran, Ori, et al.
Published: (2023)
Pay Attention to What Matters
by: Silva, Pedro Luiz, et al.
Published: (2024)
by: Silva, Pedro Luiz, et al.
Published: (2024)
Similar Items
-
Using LLMs to identify features of personal and professional skills in an open-response situational judgment test
by: Walsh, Cole, et al.
Published: (2025) -
"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most
by: Zhou, Kaitlyn, et al.
Published: (2026) -
From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring
by: Casabianca, Jodi M., et al.
Published: (2026) -
On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
by: Geng, Mingmeng, et al.
Published: (2025) -
Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models
by: Shin, Jisu, et al.
Published: (2024)