Saved in:
| Main Authors: | Li, Shuyue Stella, Mun, Jimin, Brahman, Faeze, Hosseini, Pedram, Thomas, Bryceton G., Sin, Jessica M., Ren, Bing, Ilgen, Jonathan S., Tsvetkov, Yulia, Sap, Maarten |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.14860 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
by: Li, Shuyue Stella, et al.
Published: (2024)
by: Li, Shuyue Stella, et al.
Published: (2024)
PrefDisco: Benchmarking Proactive Personalized Reasoning
by: Li, Shuyue Stella, et al.
Published: (2025)
by: Li, Shuyue Stella, et al.
Published: (2025)
A Benchmark for Long-Form Medical Question Answering
by: Hosseini, Pedram, et al.
Published: (2024)
by: Hosseini, Pedram, et al.
Published: (2024)
GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses
by: Mun, Jimin, et al.
Published: (2026)
by: Mun, Jimin, et al.
Published: (2026)
Small Reward Models via Backward Inference
by: Wang, Yike, et al.
Published: (2026)
by: Wang, Yike, et al.
Published: (2026)
Cold-Start Personalization via Training-Free Priors from Structured World Models
by: Bose, Avinandan, et al.
Published: (2026)
by: Bose, Avinandan, et al.
Published: (2026)
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
by: Baheti, Ashutosh, et al.
Published: (2023)
by: Baheti, Ashutosh, et al.
Published: (2023)
EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
by: Li, Shuyue Stella, et al.
Published: (2026)
by: Li, Shuyue Stella, et al.
Published: (2026)
InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning
by: Taranukhin, Maksym, et al.
Published: (2026)
by: Taranukhin, Maksym, et al.
Published: (2026)
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
by: Su, Zhe, et al.
Published: (2024)
by: Su, Zhe, et al.
Published: (2024)
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate
by: Mun, Jimin, et al.
Published: (2024)
by: Mun, Jimin, et al.
Published: (2024)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
by: Mireshghallah, Niloofar, et al.
Published: (2023)
by: Mireshghallah, Niloofar, et al.
Published: (2023)
Can LLMs Ask Good Questions?
by: Zhang, Yueheng, et al.
Published: (2025)
by: Zhang, Yueheng, et al.
Published: (2025)
Why (not) use AI? Analyzing People's Reasoning and Conditions for AI Acceptability
by: Mun, Jimin, et al.
Published: (2025)
by: Mun, Jimin, et al.
Published: (2025)
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
by: Zheng, Mingqian, et al.
Published: (2025)
by: Zheng, Mingqian, et al.
Published: (2025)
Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
by: Wu, Addison J., et al.
Published: (2026)
by: Wu, Addison J., et al.
Published: (2026)
Multi-Attribute Constraint Satisfaction via Language Model Rewriting
by: Baheti, Ashutosh, et al.
Published: (2024)
by: Baheti, Ashutosh, et al.
Published: (2024)
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
by: Jung, Jaehun, et al.
Published: (2024)
by: Jung, Jaehun, et al.
Published: (2024)
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
by: Kargupta, Priyanka, et al.
Published: (2025)
by: Kargupta, Priyanka, et al.
Published: (2025)
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
by: Yao, Jihan, et al.
Published: (2024)
by: Yao, Jihan, et al.
Published: (2024)
Teaching LLMs to Abstain across Languages via Multilingual Feedback
by: Feng, Shangbin, et al.
Published: (2024)
by: Feng, Shangbin, et al.
Published: (2024)
Reasoning Up the Instruction Ladder for Controllable Language Models
by: Zheng, Zishuo, et al.
Published: (2025)
by: Zheng, Zishuo, et al.
Published: (2025)
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
by: Jiang, Liwei, et al.
Published: (2024)
by: Jiang, Liwei, et al.
Published: (2024)
Ask Good Questions for Large Language Models
by: Wu, Qi, et al.
Published: (2025)
by: Wu, Qi, et al.
Published: (2025)
ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
by: Park, Chan Young, et al.
Published: (2024)
by: Park, Chan Young, et al.
Published: (2024)
Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers
by: Chakrabarty, Tuhin, et al.
Published: (2023)
by: Chakrabarty, Tuhin, et al.
Published: (2023)
Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection
by: Ahuja, Kabir, et al.
Published: (2025)
by: Ahuja, Kabir, et al.
Published: (2025)
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
by: Light, Dean, et al.
Published: (2026)
by: Light, Dean, et al.
Published: (2026)
Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits
by: Mun, Jimin, et al.
Published: (2024)
by: Mun, Jimin, et al.
Published: (2024)
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
by: Chiu, Yu Ying, et al.
Published: (2024)
by: Chiu, Yu Ying, et al.
Published: (2024)
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
by: Jiang, Liwei, et al.
Published: (2025)
by: Jiang, Liwei, et al.
Published: (2025)
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
by: Jiang, Yuru, et al.
Published: (2025)
by: Jiang, Yuru, et al.
Published: (2025)
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
by: Zhou, Xuhui, et al.
Published: (2024)
by: Zhou, Xuhui, et al.
Published: (2024)
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
by: Chen, Tong, et al.
Published: (2025)
by: Chen, Tong, et al.
Published: (2025)
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
by: Chen, Tong, et al.
Published: (2025)
by: Chen, Tong, et al.
Published: (2025)
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
by: Han, Xiaochuang, et al.
Published: (2024)
by: Han, Xiaochuang, et al.
Published: (2024)
When Should AI Read the Room? Public Perceptions of Social Intelligence in AI Agents
by: Mathur, Leena, et al.
Published: (2026)
by: Mathur, Leena, et al.
Published: (2026)
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
by: He, Zhonghao, et al.
Published: (2025)
by: He, Zhonghao, et al.
Published: (2025)
PersLitEval: Fine-grained Benchmark and Evaluation of LLMs on Persian Literature Questions
by: Niazi, Ruhallah, et al.
Published: (2026)
by: Niazi, Ruhallah, et al.
Published: (2026)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
by: Xin, Rui, et al.
Published: (2025)
by: Xin, Rui, et al.
Published: (2025)
Similar Items
-
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
by: Li, Shuyue Stella, et al.
Published: (2024) -
PrefDisco: Benchmarking Proactive Personalized Reasoning
by: Li, Shuyue Stella, et al.
Published: (2025) -
A Benchmark for Long-Form Medical Question Answering
by: Hosseini, Pedram, et al.
Published: (2024) -
GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses
by: Mun, Jimin, et al.
Published: (2026) -
Small Reward Models via Backward Inference
by: Wang, Yike, et al.
Published: (2026)