Saved in:
| Main Authors: | Khalaf, Hadi, Wang, Serena L., Halpern, Daniel, Shapira, Itai, Calmon, Flavio du Pin, Procaccia, Ariel D. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.21297 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pairwise Calibrated Rewards for Pluralistic Alignment
by: Halpern, Daniel, et al.
Published: (2025)
by: Halpern, Daniel, et al.
Published: (2025)
Inference-Time Reward Hacking in Large Language Models
by: Khalaf, Hadi, et al.
Published: (2025)
by: Khalaf, Hadi, et al.
Published: (2025)
Axioms for AI Alignment from Human Feedback
by: Ge, Luise, et al.
Published: (2024)
by: Ge, Luise, et al.
Published: (2024)
AI Alignment at Your Discretion
by: Buyl, Maarten, et al.
Published: (2025)
by: Buyl, Maarten, et al.
Published: (2025)
Learning Social Welfare Functions
by: Pardeshi, Kanad Shrikar, et al.
Published: (2024)
by: Pardeshi, Kanad Shrikar, et al.
Published: (2024)
Clone-Robust AI Alignment
by: Procaccia, Ariel D., et al.
Published: (2025)
by: Procaccia, Ariel D., et al.
Published: (2025)
How RLHF Amplifies Sycophancy
by: Shapira, Itai, et al.
Published: (2026)
by: Shapira, Itai, et al.
Published: (2026)
Incentives in Federated Learning with Heterogeneous Agents
by: Procaccia, Ariel D., et al.
Published: (2025)
by: Procaccia, Ariel D., et al.
Published: (2025)
Generative Social Choice
by: Fish, Sara, et al.
Published: (2023)
by: Fish, Sara, et al.
Published: (2023)
Attack-Aware Noise Calibration for Differential Privacy
by: Kulynych, Bogdan, et al.
Published: (2024)
by: Kulynych, Bogdan, et al.
Published: (2024)
Metritocracy: Representative Metrics for Lite Benchmarks
by: Procaccia, Ariel, et al.
Published: (2025)
by: Procaccia, Ariel, et al.
Published: (2025)
Predictive Churn with the Set of Good Models
by: Watson-Daniels, Jamelle, et al.
Published: (2024)
by: Watson-Daniels, Jamelle, et al.
Published: (2024)
Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor
by: Olteanu, Alexandra, et al.
Published: (2025)
by: Olteanu, Alexandra, et al.
Published: (2025)
Jackpot! Alignment as a Maximal Lottery
by: Maura-Rivero, Roberto-Rafael, et al.
Published: (2025)
by: Maura-Rivero, Roberto-Rafael, et al.
Published: (2025)
In This Apportionment Lottery, the House Always Wins
by: Gölz, Paul, et al.
Published: (2022)
by: Gölz, Paul, et al.
Published: (2022)
The Hidden Cost of Waiting for Accurate Predictions
by: Shirali, Ali, et al.
Published: (2025)
by: Shirali, Ali, et al.
Published: (2025)
Honor Among Bandits: No-Regret Learning for Online Fair Division
by: Procaccia, Ariel D., et al.
Published: (2024)
by: Procaccia, Ariel D., et al.
Published: (2024)
Bias Detection Via Signaling
by: Chen, Yiling, et al.
Published: (2024)
by: Chen, Yiling, et al.
Published: (2024)
Generative Social Choice: The Next Generation
by: Boehmer, Niclas, et al.
Published: (2025)
by: Boehmer, Niclas, et al.
Published: (2025)
Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
by: Long, Carol Xuan, et al.
Published: (2026)
by: Long, Carol Xuan, et al.
Published: (2026)
The Proportional Veto Principle for Approval Ballots
by: Halpern, Daniel, et al.
Published: (2025)
by: Halpern, Daniel, et al.
Published: (2025)
Policy Aggregation
by: Alamdari, Parand A., et al.
Published: (2024)
by: Alamdari, Parand A., et al.
Published: (2024)
Regretful Decisions under Label Noise
by: Nagaraj, Sujay, et al.
Published: (2025)
by: Nagaraj, Sujay, et al.
Published: (2025)
Adaptive Contracts for Cost-Effective AI Delegation
by: Saig, Eden, et al.
Published: (2026)
by: Saig, Eden, et al.
Published: (2026)
Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions
by: Wang, Hao, et al.
Published: (2023)
by: Wang, Hao, et al.
Published: (2023)
Robust Neural Processes for Noisy Data
by: Shapira, Chen, et al.
Published: (2024)
by: Shapira, Chen, et al.
Published: (2024)
Selective Explanations
by: Paes, Lucas Monteiro, et al.
Published: (2024)
by: Paes, Lucas Monteiro, et al.
Published: (2024)
Fair Machine Unlearning: Data Removal while Mitigating Disparities
by: Oesterling, Alex, et al.
Published: (2023)
by: Oesterling, Alex, et al.
Published: (2023)
Alternates, Assemble! Selecting Optimal Alternates for Citizens' Assemblies
by: Assos, Angelos, et al.
Published: (2025)
by: Assos, Angelos, et al.
Published: (2025)
New Guarantees for Learning Revenue Maximizing Menus of Lotteries and Two-Part Tariffs
by: Balcan, Maria-Florina, et al.
Published: (2023)
by: Balcan, Maria-Florina, et al.
Published: (2023)
Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing
by: Paes, Lucas Monteiro, et al.
Published: (2023)
by: Paes, Lucas Monteiro, et al.
Published: (2023)
Inference-Time Machine Unlearning via Gated Activation Redirection
by: Turani, Vinícius Conte, et al.
Published: (2026)
by: Turani, Vinícius Conte, et al.
Published: (2026)
Strategic Classification With Externalities
by: Hossain, Safwan, et al.
Published: (2024)
by: Hossain, Safwan, et al.
Published: (2024)
Tight Robustness Certification Through the Convex Hull of $\ell_0$ Attacks
by: Shapira, Yuval, et al.
Published: (2025)
by: Shapira, Yuval, et al.
Published: (2025)
A New Perspective on Shampoo's Preconditioner
by: Morwani, Depen, et al.
Published: (2024)
by: Morwani, Depen, et al.
Published: (2024)
Federated Assemblies
by: Halpern, Daniel, et al.
Published: (2024)
by: Halpern, Daniel, et al.
Published: (2024)
Finding Common Ground in a Sea of Alternatives
by: Chooi, Jay, et al.
Published: (2026)
by: Chooi, Jay, et al.
Published: (2026)
Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation
by: Vithana, Sajani, et al.
Published: (2024)
by: Vithana, Sajani, et al.
Published: (2024)
Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling
by: Shapira, Eilam, et al.
Published: (2026)
by: Shapira, Eilam, et al.
Published: (2026)
KS-Lottery: Finding Certified Lottery Tickets for Multilingual Language Models
by: Yuan, Fei, et al.
Published: (2024)
by: Yuan, Fei, et al.
Published: (2024)
Similar Items
-
Pairwise Calibrated Rewards for Pluralistic Alignment
by: Halpern, Daniel, et al.
Published: (2025) -
Inference-Time Reward Hacking in Large Language Models
by: Khalaf, Hadi, et al.
Published: (2025) -
Axioms for AI Alignment from Human Feedback
by: Ge, Luise, et al.
Published: (2024) -
AI Alignment at Your Discretion
by: Buyl, Maarten, et al.
Published: (2025) -
Learning Social Welfare Functions
by: Pardeshi, Kanad Shrikar, et al.
Published: (2024)