Saved in:
| Main Authors: | Joglekar, Manas, Chen, Jeremy, Wu, Gabriel, Yosinski, Jason, Wang, Jasmine, Barak, Boaz, Glaese, Amelia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.08093 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deliberative Alignment: Reasoning Enables Safer Language Models
by: Guan, Melody Y., et al.
Published: (2024)
by: Guan, Melody Y., et al.
Published: (2024)
Automatic Stability and Recovery for Neural Network Training
by: Or, Barak
Published: (2026)
by: Or, Barak
Published: (2026)
Preference Learning with Lie Detectors can Induce Honesty or Evasion
by: Cundy, Chris, et al.
Published: (2025)
by: Cundy, Chris, et al.
Published: (2025)
Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning
by: Gu, Renjie, et al.
Published: (2026)
by: Gu, Renjie, et al.
Published: (2026)
Gradient-Free Training of Quantized Neural Networks
by: Cohen, Noa, et al.
Published: (2024)
by: Cohen, Noa, et al.
Published: (2024)
MESSI: A Multi-Elevation Semantic Segmentation Image Dataset of an Urban Environment
by: Pinkovich, Barak, et al.
Published: (2025)
by: Pinkovich, Barak, et al.
Published: (2025)
Distinguishing the Knowable from the Unknowable with Language Models
by: Ahdritz, Gustaf, et al.
Published: (2024)
by: Ahdritz, Gustaf, et al.
Published: (2024)
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
by: Yang, Jinluan, et al.
Published: (2025)
by: Yang, Jinluan, et al.
Published: (2025)
The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
by: Taufeeque, Mohammad, et al.
Published: (2026)
by: Taufeeque, Mohammad, et al.
Published: (2026)
Generative Neural Reparameterization for Differentiable PDE-constrained Optimization
by: Joglekar, Archis S.
Published: (2024)
by: Joglekar, Archis S.
Published: (2024)
CarSpeedNet: Learning-Based Speed Estimation from Accelerometer-Only Inertial Sensing
by: Or, Barak
Published: (2024)
by: Or, Barak
Published: (2024)
Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems
by: Or, Barak
Published: (2026)
by: Or, Barak
Published: (2026)
Think Before You Lie: How Reasoning Leads to Honesty
by: Yuan, Ann, et al.
Published: (2026)
by: Yuan, Ann, et al.
Published: (2026)
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
by: McKee-Reid, Leo, et al.
Published: (2024)
by: McKee-Reid, Leo, et al.
Published: (2024)
Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information
by: Miceli-Barone, Antonio Valerio, et al.
Published: (2026)
by: Miceli-Barone, Antonio Valerio, et al.
Published: (2026)
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
by: Liu, Ryan, et al.
Published: (2024)
by: Liu, Ryan, et al.
Published: (2024)
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by: Ren, Richard, et al.
Published: (2025)
by: Ren, Richard, et al.
Published: (2025)
GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control
by: Xu, Haofeng, et al.
Published: (2026)
by: Xu, Haofeng, et al.
Published: (2026)
Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
by: Chen, Zhengyu, et al.
Published: (2025)
by: Chen, Zhengyu, et al.
Published: (2025)
TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning
by: Zhang, Junru, et al.
Published: (2025)
by: Zhang, Junru, et al.
Published: (2025)
RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning
by: Li, Ran, et al.
Published: (2026)
by: Li, Ran, et al.
Published: (2026)
Effective Sample Size and Generalization Bounds for Temporal Networks
by: Gahtan, Barak, et al.
Published: (2025)
by: Gahtan, Barak, et al.
Published: (2025)
Knowledge Integration Strategies in Autonomous Vehicle Prediction and Planning: A Comprehensive Survey
by: Manas, Kumar, et al.
Published: (2025)
by: Manas, Kumar, et al.
Published: (2025)
Scaling Data-Constrained Language Models
by: Muennighoff, Niklas, et al.
Published: (2023)
by: Muennighoff, Niklas, et al.
Published: (2023)
Training Long-Context LLMs Efficiently via Chunk-wise Optimization
by: Li, Wenhao, et al.
Published: (2025)
by: Li, Wenhao, et al.
Published: (2025)
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
by: Liu, Ziyue, et al.
Published: (2025)
by: Liu, Ziyue, et al.
Published: (2025)
Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch
by: Zeng, Yirong, et al.
Published: (2025)
by: Zeng, Yirong, et al.
Published: (2025)
Efficient Representations are Controllable Representations
by: Ye, Charles, et al.
Published: (2026)
by: Ye, Charles, et al.
Published: (2026)
Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation
by: Xu, Hefei, et al.
Published: (2025)
by: Xu, Hefei, et al.
Published: (2025)
Stress Testing Deliberative Alignment for Anti-Scheming Training
by: Schoen, Bronson, et al.
Published: (2025)
by: Schoen, Bronson, et al.
Published: (2025)
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
by: Joshi, Abhinav, et al.
Published: (2024)
by: Joshi, Abhinav, et al.
Published: (2024)
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
by: Luo, Yingsong, et al.
Published: (2024)
by: Luo, Yingsong, et al.
Published: (2024)
Learning Dynamics of RNNs in Closed-Loop Environments
by: Ger, Yoav, et al.
Published: (2025)
by: Ger, Yoav, et al.
Published: (2025)
Learning reveals invisible structure in low-rank RNNs
by: Ger, Yoav, et al.
Published: (2026)
by: Ger, Yoav, et al.
Published: (2026)
Thoth: Mid-Training Bridges LLMs to Time Series Understanding
by: Lin, Jiafeng, et al.
Published: (2026)
by: Lin, Jiafeng, et al.
Published: (2026)
A Hybrid Adaptive Velocity Aided Navigation Filter with Application to INS/DVL Fusion
by: Or, Barak, et al.
Published: (2022)
by: Or, Barak, et al.
Published: (2022)
Recent Trends in Modelling the Continuous Time Series using Deep Learning: A Survey
by: Habiba, Mansura, et al.
Published: (2024)
by: Habiba, Mansura, et al.
Published: (2024)
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
by: Wu, Runzhe, et al.
Published: (2025)
by: Wu, Runzhe, et al.
Published: (2025)
Test-Time Training on Graphs with Large Language Models (LLMs)
by: Zhang, Jiaxin, et al.
Published: (2024)
by: Zhang, Jiaxin, et al.
Published: (2024)
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
by: Yang, Kai, et al.
Published: (2025)
by: Yang, Kai, et al.
Published: (2025)
Similar Items
-
Deliberative Alignment: Reasoning Enables Safer Language Models
by: Guan, Melody Y., et al.
Published: (2024) -
Automatic Stability and Recovery for Neural Network Training
by: Or, Barak
Published: (2026) -
Preference Learning with Lie Detectors can Induce Honesty or Evasion
by: Cundy, Chris, et al.
Published: (2025) -
Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning
by: Gu, Renjie, et al.
Published: (2026) -
Gradient-Free Training of Quantized Neural Networks
by: Cohen, Noa, et al.
Published: (2024)