:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Sahoo, Subramanyam
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.13016
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)

The Horcrux: Mechanistically Interpretable Task Decomposition for Detecting and Mitigating Reward Hacking in Embodied AI Systems
by: Sahoo, Subramanyam, et al.
Published: (2025)

The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces
by: Sahoo, Subramanyam
Published: (2025)

A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models
by: Tuan, Yi-Lin, et al.
Published: (2024)

The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
by: Sahoo, Subramanyam, et al.
Published: (2025)

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
by: Sahoo, Subramanyam, et al.
Published: (2026)

Ambient Diffusion Omni: Training Good Models with Bad Data
by: Daras, Giannis, et al.
Published: (2025)

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
by: Sahoo, Subramanyam, et al.
Published: (2026)

Boardwalk Empire: How Generative AI is Revolutionizing Economic Paradigms
by: Sahoo, Subramanyam, et al.
Published: (2024)

Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
by: Anand, Emile, et al.
Published: (2025)

Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma
by: Sahoo, Subramanyam, et al.
Published: (2025)

The Great Contradiction Showdown: How Jailbreak and Stealth Wrestle in Vision-Language Models?
by: Kao, Ching-Chia, et al.
Published: (2024)

When Bad Data Leads to Good Models
by: Li, Kenneth, et al.
Published: (2025)

Good Allocations from Bad Estimates
by: Casacuberta, Sílvia, et al.
Published: (2026)

Blog Data Showdown: Machine Learning vs Neuro-Symbolic Models for Gender Classification
by: Sinshaw, Natnael Tilahun, et al.
Published: (2025)

I Can't Believe It's Not Robust: Catastrophic Collapse of Safety Classifiers under Embedding Drift
by: Sahoo, Subramanyam, et al.
Published: (2026)

SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
by: Sahoo, Subramanyam, et al.
Published: (2026)

From Rattle to Roar: Optimizer Showdown for MambaStock on S&P 500
by: Chan, Alena, et al.
Published: (2025)

BadReward: Clean-Label Poisoning of Reward Models in Text-to-Image RLHF
by: Duan, Kaiwen, et al.
Published: (2025)

Vertical Federated Learning in Practice: The Good, the Bad, and the Ugly
by: Wu, Zhaomin, et al.
Published: (2025)

Sequence-Aware Inline Measurement Attribution for Good-Bad Wafer Diagnosis
by: Miyaguchi, Kohei, et al.
Published: (2025)

Agent Performing Autonomous Stock Trading under Good and Bad Situations
by: Luo, Yunfei, et al.
Published: (2023)

Simulation, Modelling and Classification of Wiki Contributors: Spotting The Good, The Bad, and The Ugly
by: Méndez, Silvia García, et al.
Published: (2024)

GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning
by: Wang, Chenglong, et al.
Published: (2025)

FADE: Why Bad Descriptions Happen to Good Features
by: Puri, Bruno, et al.
Published: (2025)

Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning
by: Hoang, Huy, et al.
Published: (2023)

Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs
by: Banerjee, Debangshu, et al.
Published: (2023)

Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models
by: Yang, Shidong, et al.
Published: (2026)

Adversarial Training of Reward Models
by: Bukharin, Alexander, et al.
Published: (2025)

Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
by: Subramanyam, Anirudh, et al.
Published: (2025)

Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better
by: Song, Meng
Published: (2025)

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning
by: Li, Mengqi, et al.
Published: (2025)

The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification
by: Baharav, Tavor Z., et al.
Published: (2025)

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
by: Wang, Haozhe, et al.
Published: (2026)

Keypoint Aware Masked Image Modelling
by: Krishna, Madhava, et al.
Published: (2024)

AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators
by: Mazumder, Aritra, et al.
Published: (2026)

Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
by: Tang, Kaihua, et al.
Published: (2020)

Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
by: Bansal, Hritik, et al.
Published: (2024)

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
by: Liu, Zihan, et al.
Published: (2024)

SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?
by: Ho, Sy-Tuyen, et al.
Published: (2026)