:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shapira, Itai, Benade, Gerdus, Procaccia, Ariel D.
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.01002
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Pairwise Calibrated Rewards for Pluralistic Alignment
by: Halpern, Daniel, et al.
Published: (2025)

Axioms for AI Alignment from Human Feedback
by: Ge, Luise, et al.
Published: (2024)

Generative Social Choice
by: Fish, Sara, et al.
Published: (2023)

Offline Local Search for Online Stochastic Bandits
by: Benadè, Gerdus, et al.
Published: (2026)

Incentives in Federated Learning with Heterogeneous Agents
by: Procaccia, Ariel D., et al.
Published: (2025)

Embeddings for Preferences, Not Semantics
by: Blair, Carter, et al.
Published: (2026)

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
by: Kumarappan, Adarsh, et al.
Published: (2026)

Learning Social Welfare Functions
by: Pardeshi, Kanad Shrikar, et al.
Published: (2024)

Generative Social Choice: The Next Generation
by: Boehmer, Niclas, et al.
Published: (2025)

Clone-Robust AI Alignment
by: Procaccia, Ariel D., et al.
Published: (2025)

Bias Detection Via Signaling
by: Chen, Yiling, et al.
Published: (2024)

Policy Aggregation
by: Alamdari, Parand A., et al.
Published: (2024)

Alternates, Assemble! Selecting Optimal Alternates for Citizens' Assemblies
by: Assos, Angelos, et al.
Published: (2025)

Strategic Classification With Externalities
by: Hossain, Safwan, et al.
Published: (2024)

How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)

Finding Common Ground in a Sea of Alternatives
by: Chooi, Jay, et al.
Published: (2026)

Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models
by: Natan, Shahar Ben, et al.
Published: (2026)

Sycophancy is an Educational Safety Risk: Why LLM Tutors Need Sycophancy Benchmarks
by: Kasneci, Enkelejda, et al.
Published: (2026)

Moral Sycophancy in Vision Language Models
by: Rabby, Shadman, et al.
Published: (2026)

SycEval: Evaluating LLM Sycophancy
by: Fanous, Aaron, et al.
Published: (2025)

Question the Questions: Auditing Representation in Online Deliberative Processes
by: De, Soham, et al.
Published: (2025)

Linear Probe Penalties Reduce LLM Sycophancy
by: Papadatos, Henry, et al.
Published: (2024)

When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models
by: Li, Jiechen, et al.
Published: (2026)

Direct Alignment with Heterogeneous Preferences
by: Shirali, Ali, et al.
Published: (2025)

Sycophancy Hides Linearly in the Attention Heads
by: Genadi, Rifo, et al.
Published: (2026)

BASIL: Bayesian Assessment of Sycophancy in LLMs
by: Atwell, Katherine, et al.
Published: (2025)

Adaptive Contracts for Cost-Effective AI Delegation
by: Saig, Eden, et al.
Published: (2026)

Sycophancy in Large Language Models: Causes and Mitigations
by: Malmqvist, Lars
Published: (2024)

Consistency Training Helps Stop Sycophancy and Jailbreaks
by: Irpan, Alex, et al.
Published: (2025)

Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models
by: Maltbie, Benjamin, et al.
Published: (2026)

Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor
by: Törnberg, Petter, et al.
Published: (2026)

SOAP: Improving and Stabilizing Shampoo using Adam
by: Vyas, Nikhil, et al.
Published: (2024)

RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)

Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention
by: Wang, Libo
Published: (2024)

PENDULUM: A Benchmark for Assessing Sycophancy in Multimodal Large Language Models
by: Rahman, A. B. M. Ashikur, et al.
Published: (2025)

Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)

Dataset Reset Policy Optimization for RLHF
by: Chang, Jonathan D., et al.
Published: (2024)

Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment
by: Chang, Edward Y.
Published: (2026)

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)

ROCM: RLHF on consistency models
by: Shekhar, Shivanshu, et al.
Published: (2025)