Saved in:
| Main Authors: | Shapira, Itai, Benade, Gerdus, Procaccia, Ariel D. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01002 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pairwise Calibrated Rewards for Pluralistic Alignment
by: Halpern, Daniel, et al.
Published: (2025)
by: Halpern, Daniel, et al.
Published: (2025)
Axioms for AI Alignment from Human Feedback
by: Ge, Luise, et al.
Published: (2024)
by: Ge, Luise, et al.
Published: (2024)
Generative Social Choice
by: Fish, Sara, et al.
Published: (2023)
by: Fish, Sara, et al.
Published: (2023)
Offline Local Search for Online Stochastic Bandits
by: Benadè, Gerdus, et al.
Published: (2026)
by: Benadè, Gerdus, et al.
Published: (2026)
Incentives in Federated Learning with Heterogeneous Agents
by: Procaccia, Ariel D., et al.
Published: (2025)
by: Procaccia, Ariel D., et al.
Published: (2025)
Embeddings for Preferences, Not Semantics
by: Blair, Carter, et al.
Published: (2026)
by: Blair, Carter, et al.
Published: (2026)
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
by: Kumarappan, Adarsh, et al.
Published: (2026)
by: Kumarappan, Adarsh, et al.
Published: (2026)
Learning Social Welfare Functions
by: Pardeshi, Kanad Shrikar, et al.
Published: (2024)
by: Pardeshi, Kanad Shrikar, et al.
Published: (2024)
Generative Social Choice: The Next Generation
by: Boehmer, Niclas, et al.
Published: (2025)
by: Boehmer, Niclas, et al.
Published: (2025)
Clone-Robust AI Alignment
by: Procaccia, Ariel D., et al.
Published: (2025)
by: Procaccia, Ariel D., et al.
Published: (2025)
Bias Detection Via Signaling
by: Chen, Yiling, et al.
Published: (2024)
by: Chen, Yiling, et al.
Published: (2024)
Policy Aggregation
by: Alamdari, Parand A., et al.
Published: (2024)
by: Alamdari, Parand A., et al.
Published: (2024)
Alternates, Assemble! Selecting Optimal Alternates for Citizens' Assemblies
by: Assos, Angelos, et al.
Published: (2025)
by: Assos, Angelos, et al.
Published: (2025)
Strategic Classification With Externalities
by: Hossain, Safwan, et al.
Published: (2024)
by: Hossain, Safwan, et al.
Published: (2024)
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)
by: Frick, Evan, et al.
Published: (2024)
Finding Common Ground in a Sea of Alternatives
by: Chooi, Jay, et al.
Published: (2026)
by: Chooi, Jay, et al.
Published: (2026)
Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models
by: Natan, Shahar Ben, et al.
Published: (2026)
by: Natan, Shahar Ben, et al.
Published: (2026)
Sycophancy is an Educational Safety Risk: Why LLM Tutors Need Sycophancy Benchmarks
by: Kasneci, Enkelejda, et al.
Published: (2026)
by: Kasneci, Enkelejda, et al.
Published: (2026)
Moral Sycophancy in Vision Language Models
by: Rabby, Shadman, et al.
Published: (2026)
by: Rabby, Shadman, et al.
Published: (2026)
SycEval: Evaluating LLM Sycophancy
by: Fanous, Aaron, et al.
Published: (2025)
by: Fanous, Aaron, et al.
Published: (2025)
Question the Questions: Auditing Representation in Online Deliberative Processes
by: De, Soham, et al.
Published: (2025)
by: De, Soham, et al.
Published: (2025)
Linear Probe Penalties Reduce LLM Sycophancy
by: Papadatos, Henry, et al.
Published: (2024)
by: Papadatos, Henry, et al.
Published: (2024)
When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models
by: Li, Jiechen, et al.
Published: (2026)
by: Li, Jiechen, et al.
Published: (2026)
Direct Alignment with Heterogeneous Preferences
by: Shirali, Ali, et al.
Published: (2025)
by: Shirali, Ali, et al.
Published: (2025)
Sycophancy Hides Linearly in the Attention Heads
by: Genadi, Rifo, et al.
Published: (2026)
by: Genadi, Rifo, et al.
Published: (2026)
BASIL: Bayesian Assessment of Sycophancy in LLMs
by: Atwell, Katherine, et al.
Published: (2025)
by: Atwell, Katherine, et al.
Published: (2025)
Adaptive Contracts for Cost-Effective AI Delegation
by: Saig, Eden, et al.
Published: (2026)
by: Saig, Eden, et al.
Published: (2026)
Sycophancy in Large Language Models: Causes and Mitigations
by: Malmqvist, Lars
Published: (2024)
by: Malmqvist, Lars
Published: (2024)
Consistency Training Helps Stop Sycophancy and Jailbreaks
by: Irpan, Alex, et al.
Published: (2025)
by: Irpan, Alex, et al.
Published: (2025)
Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models
by: Maltbie, Benjamin, et al.
Published: (2026)
by: Maltbie, Benjamin, et al.
Published: (2026)
Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor
by: Törnberg, Petter, et al.
Published: (2026)
by: Törnberg, Petter, et al.
Published: (2026)
SOAP: Improving and Stabilizing Shampoo using Adam
by: Vyas, Nikhil, et al.
Published: (2024)
by: Vyas, Nikhil, et al.
Published: (2024)
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)
by: Dong, Hanze, et al.
Published: (2024)
Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention
by: Wang, Libo
Published: (2024)
by: Wang, Libo
Published: (2024)
PENDULUM: A Benchmark for Assessing Sycophancy in Multimodal Large Language Models
by: Rahman, A. B. M. Ashikur, et al.
Published: (2025)
by: Rahman, A. B. M. Ashikur, et al.
Published: (2025)
Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)
by: Sahoo, Subramanyam
Published: (2026)
Dataset Reset Policy Optimization for RLHF
by: Chang, Jonathan D., et al.
Published: (2024)
by: Chang, Jonathan D., et al.
Published: (2024)
Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment
by: Chang, Edward Y.
Published: (2026)
by: Chang, Edward Y.
Published: (2026)
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)
by: Hu, Jian, et al.
Published: (2024)
ROCM: RLHF on consistency models
by: Shekhar, Shivanshu, et al.
Published: (2025)
by: Shekhar, Shivanshu, et al.
Published: (2025)
Similar Items
-
Pairwise Calibrated Rewards for Pluralistic Alignment
by: Halpern, Daniel, et al.
Published: (2025) -
Axioms for AI Alignment from Human Feedback
by: Ge, Luise, et al.
Published: (2024) -
Generative Social Choice
by: Fish, Sara, et al.
Published: (2023) -
Offline Local Search for Online Stochastic Bandits
by: Benadè, Gerdus, et al.
Published: (2026) -
Incentives in Federated Learning with Heterogeneous Agents
by: Procaccia, Ariel D., et al.
Published: (2025)