:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Shuyue Stella, Mun, Jimin, Brahman, Faeze, Hosseini, Pedram, Thomas, Bryceton G., Sin, Jessica M., Ren, Bing, Ilgen, Jonathan S., Tsvetkov, Yulia, Sap, Maarten
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.14860
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
by: Li, Shuyue Stella, et al.
Published: (2024)

PrefDisco: Benchmarking Proactive Personalized Reasoning
by: Li, Shuyue Stella, et al.
Published: (2025)

A Benchmark for Long-Form Medical Question Answering
by: Hosseini, Pedram, et al.
Published: (2024)

GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses
by: Mun, Jimin, et al.
Published: (2026)

Small Reward Models via Backward Inference
by: Wang, Yike, et al.
Published: (2026)

Cold-Start Personalization via Training-Free Priors from Structured World Models
by: Bose, Avinandan, et al.
Published: (2026)

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
by: Baheti, Ashutosh, et al.
Published: (2023)

EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
by: Li, Shuyue Stella, et al.
Published: (2026)

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning
by: Taranukhin, Maksym, et al.
Published: (2026)

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
by: Su, Zhe, et al.
Published: (2024)

Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate
by: Mun, Jimin, et al.
Published: (2024)

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
by: Mireshghallah, Niloofar, et al.
Published: (2023)

Can LLMs Ask Good Questions?
by: Zhang, Yueheng, et al.
Published: (2025)

Why (not) use AI? Analyzing People's Reasoning and Conditions for AI Acceptability
by: Mun, Jimin, et al.
Published: (2025)

Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
by: Zheng, Mingqian, et al.
Published: (2025)

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
by: Wu, Addison J., et al.
Published: (2026)

Multi-Attribute Constraint Satisfaction via Language Model Rewriting
by: Baheti, Ashutosh, et al.
Published: (2024)

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
by: Jung, Jaehun, et al.
Published: (2024)

Cognitive Foundations for Reasoning and Their Manifestation in LLMs
by: Kargupta, Priyanka, et al.
Published: (2025)

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
by: Yao, Jihan, et al.
Published: (2024)

Teaching LLMs to Abstain across Languages via Multilingual Feedback
by: Feng, Shangbin, et al.
Published: (2024)

Reasoning Up the Instruction Ladder for Controllable Language Models
by: Zheng, Zishuo, et al.
Published: (2025)

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
by: Jiang, Liwei, et al.
Published: (2024)

Ask Good Questions for Large Language Models
by: Wu, Qi, et al.
Published: (2025)

ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
by: Park, Chan Young, et al.
Published: (2024)

Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers
by: Chakrabarty, Tuhin, et al.
Published: (2023)

Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection
by: Ahuja, Kabir, et al.
Published: (2025)

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
by: Light, Dean, et al.
Published: (2026)

Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits
by: Mun, Jimin, et al.
Published: (2024)

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
by: Chiu, Yu Ying, et al.
Published: (2024)

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
by: Jiang, Liwei, et al.
Published: (2025)

SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
by: Jiang, Yuru, et al.
Published: (2025)

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
by: Zhou, Xuhui, et al.
Published: (2024)

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
by: Chen, Tong, et al.
Published: (2025)

Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
by: Chen, Tong, et al.
Published: (2025)

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
by: Han, Xiaochuang, et al.
Published: (2024)

When Should AI Read the Room? Public Perceptions of Social Intelligence in AI Agents
by: Mathur, Leena, et al.
Published: (2026)

Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
by: He, Zhonghao, et al.
Published: (2025)

PersLitEval: Fine-grained Benchmark and Evaluation of LLMs on Persian Literature Questions
by: Niazi, Ruhallah, et al.
Published: (2026)

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
by: Xin, Rui, et al.
Published: (2025)