Saved in:
| Main Author: | Sahoo, Subramanyam |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.13821 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
by: Sahoo, Subramanyam
Published: (2025)
by: Sahoo, Subramanyam
Published: (2025)
The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
by: Sahoo, Subramanyam, et al.
Published: (2025)
by: Sahoo, Subramanyam, et al.
Published: (2025)
Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)
by: Sahoo, Subramanyam
Published: (2026)
The Horcrux: Mechanistically Interpretable Task Decomposition for Detecting and Mitigating Reward Hacking in Embodied AI Systems
by: Sahoo, Subramanyam, et al.
Published: (2025)
by: Sahoo, Subramanyam, et al.
Published: (2025)
Multi-Agent Systems Execute Arbitrary Malicious Code
by: Triedman, Harold, et al.
Published: (2025)
by: Triedman, Harold, et al.
Published: (2025)
Boardwalk Empire: How Generative AI is Revolutionizing Economic Paradigms
by: Sahoo, Subramanyam, et al.
Published: (2024)
by: Sahoo, Subramanyam, et al.
Published: (2024)
Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma
by: Sahoo, Subramanyam, et al.
Published: (2025)
by: Sahoo, Subramanyam, et al.
Published: (2025)
I Can't Believe It's Not Robust: Catastrophic Collapse of Safety Classifiers under Embedding Drift
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
by: Vashishtha, Aniket, et al.
Published: (2025)
by: Vashishtha, Aniket, et al.
Published: (2025)
Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs
by: Haque, Mirazul, et al.
Published: (2025)
by: Haque, Mirazul, et al.
Published: (2025)
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
by: Islah, Nizar, et al.
Published: (2024)
by: Islah, Nizar, et al.
Published: (2024)
GasTrace: Detecting Sandwich Attack Malicious Accounts in Ethereum
by: Liu, Zekai, et al.
Published: (2024)
by: Liu, Zekai, et al.
Published: (2024)
VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation
by: Zheng, Yan, et al.
Published: (2026)
by: Zheng, Yan, et al.
Published: (2026)
Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
by: Subramanyam, Anirudh, et al.
Published: (2025)
by: Subramanyam, Anirudh, et al.
Published: (2025)
Towards Quantum Machine Learning for Malicious Code Analysis
by: Lopez, Jesus, et al.
Published: (2025)
by: Lopez, Jesus, et al.
Published: (2025)
Localizing Malicious Outputs from CodeLLM
by: Borana, Mayukh, et al.
Published: (2025)
by: Borana, Mayukh, et al.
Published: (2025)
Keypoint Aware Masked Image Modelling
by: Krishna, Madhava, et al.
Published: (2024)
by: Krishna, Madhava, et al.
Published: (2024)
Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation
by: Rabin, Rafiqul, et al.
Published: (2025)
by: Rabin, Rafiqul, et al.
Published: (2025)
Learning Unmasking Policies for Diffusion Language Models
by: Jazbec, Metod, et al.
Published: (2025)
by: Jazbec, Metod, et al.
Published: (2025)
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
by: Wahed, Muntasir, et al.
Published: (2025)
by: Wahed, Muntasir, et al.
Published: (2025)
Self-Execution Simulation Improves Coding Models
by: Maimon, Gallil, et al.
Published: (2026)
by: Maimon, Gallil, et al.
Published: (2026)
Detecting Malicious AI Agents Through Simulated Interactions
by: Pi, Yulu, et al.
Published: (2025)
by: Pi, Yulu, et al.
Published: (2025)
Unmasking Trees for Tabular Data
by: McCarter, Calvin
Published: (2024)
by: McCarter, Calvin
Published: (2024)
Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning
by: Li, Yichen, et al.
Published: (2024)
by: Li, Yichen, et al.
Published: (2024)
Autonomous Vehicle Decision-Making Framework for Considering Malicious Behavior at Unsignalized Intersections
by: Li, Qing, et al.
Published: (2024)
by: Li, Qing, et al.
Published: (2024)
FreeMOCA: Memory-Free Continual Learning for Malicious Code Analysis
by: Asadi, Zahra, et al.
Published: (2026)
by: Asadi, Zahra, et al.
Published: (2026)
Byzantine Outside, Curious Inside: Reconstructing Data Through Malicious Updates
by: Yue, Kai, et al.
Published: (2025)
by: Yue, Kai, et al.
Published: (2025)
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
by: Yu, Zhuohao, et al.
Published: (2024)
by: Yu, Zhuohao, et al.
Published: (2024)
Understanding Diffusion Models via Code Execution
by: Yu, Cheng
Published: (2025)
by: Yu, Cheng
Published: (2025)
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs
by: Cheng, Ching-An, et al.
Published: (2024)
by: Cheng, Ching-An, et al.
Published: (2024)
Execution Guided Line-by-Line Code Generation
by: Lavon, Boaz, et al.
Published: (2025)
by: Lavon, Boaz, et al.
Published: (2025)
Provably Safe Model Updates
by: Elmecker-Plakolm, Leo, et al.
Published: (2025)
by: Elmecker-Plakolm, Leo, et al.
Published: (2025)
Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking
by: Chao, Chen-Hao, et al.
Published: (2025)
by: Chao, Chen-Hao, et al.
Published: (2025)
TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition
by: Vellaisamy, Prabhu, et al.
Published: (2026)
by: Vellaisamy, Prabhu, et al.
Published: (2026)
Execution-Grounded Credit Assignment for GRPO in Code Generation
by: Kumar, Abhijit, et al.
Published: (2026)
by: Kumar, Abhijit, et al.
Published: (2026)
Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models
by: Lee, Sanghyun, et al.
Published: (2025)
by: Lee, Sanghyun, et al.
Published: (2025)
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
by: Ben-Hamu, Heli, et al.
Published: (2025)
by: Ben-Hamu, Heli, et al.
Published: (2025)
Similar Items
-
The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
by: Sahoo, Subramanyam
Published: (2025) -
The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
by: Sahoo, Subramanyam, et al.
Published: (2025) -
Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026) -
The Horcrux: Mechanistically Interpretable Task Decomposition for Detecting and Mitigating Reward Hacking in Embodied AI Systems
by: Sahoo, Subramanyam, et al.
Published: (2025) -
Multi-Agent Systems Execute Arbitrary Malicious Code
by: Triedman, Harold, et al.
Published: (2025)