:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Joglekar, Manas, Chen, Jeremy, Wu, Gabriel, Yosinski, Jason, Wang, Jasmine, Barak, Boaz, Glaese, Amelia
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.08093
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deliberative Alignment: Reasoning Enables Safer Language Models
by: Guan, Melody Y., et al.
Published: (2024)

Automatic Stability and Recovery for Neural Network Training
by: Or, Barak
Published: (2026)

Preference Learning with Lie Detectors can Induce Honesty or Evasion
by: Cundy, Chris, et al.
Published: (2025)

Unlearners Can Lie: Evaluating and Improving Honesty in LLM Unlearning
by: Gu, Renjie, et al.
Published: (2026)

Gradient-Free Training of Quantized Neural Networks
by: Cohen, Noa, et al.
Published: (2024)

MESSI: A Multi-Elevation Semantic Segmentation Image Dataset of an Urban Environment
by: Pinkovich, Barak, et al.
Published: (2025)

Distinguishing the Knowable from the Unknowable with Language Models
by: Ahdritz, Gustaf, et al.
Published: (2024)

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
by: Yang, Jinluan, et al.
Published: (2025)

The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
by: Taufeeque, Mohammad, et al.
Published: (2026)

Generative Neural Reparameterization for Differentiable PDE-constrained Optimization
by: Joglekar, Archis S.
Published: (2024)

CarSpeedNet: Learning-Based Speed Estimation from Accelerometer-Only Inertial Sensing
by: Or, Barak
Published: (2024)

Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems
by: Or, Barak
Published: (2026)

Think Before You Lie: How Reasoning Leads to Honesty
by: Yuan, Ann, et al.
Published: (2026)

Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
by: McKee-Reid, Leo, et al.
Published: (2024)

Used Car Salesbots? Honesty and Credulity of LLMs as Bargaining Agents under Partial Information
by: Miceli-Barone, Antonio Valerio, et al.
Published: (2026)

How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
by: Liu, Ryan, et al.
Published: (2024)

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
by: Ren, Richard, et al.
Published: (2025)

GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control
by: Xu, Haofeng, et al.
Published: (2026)

Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
by: Chen, Zhengyu, et al.
Published: (2025)

TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning
by: Zhang, Junru, et al.
Published: (2025)

RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning
by: Li, Ran, et al.
Published: (2026)

Effective Sample Size and Generalization Bounds for Temporal Networks
by: Gahtan, Barak, et al.
Published: (2025)

Knowledge Integration Strategies in Autonomous Vehicle Prediction and Planning: A Comprehensive Survey
by: Manas, Kumar, et al.
Published: (2025)

Scaling Data-Constrained Language Models
by: Muennighoff, Niklas, et al.
Published: (2023)

Training Long-Context LLMs Efficiently via Chunk-wise Optimization
by: Li, Wenhao, et al.
Published: (2025)

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
by: Liu, Ziyue, et al.
Published: (2025)

Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch
by: Zeng, Yirong, et al.
Published: (2025)

Efficient Representations are Controllable Representations
by: Ye, Charles, et al.
Published: (2026)

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation
by: Xu, Hefei, et al.
Published: (2025)

Stress Testing Deliberative Alignment for Anti-Scheming Training
by: Schoen, Bronson, et al.
Published: (2025)

Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
by: Joshi, Abhinav, et al.
Published: (2024)

DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
by: Luo, Yingsong, et al.
Published: (2024)

Learning Dynamics of RNNs in Closed-Loop Environments
by: Ger, Yoav, et al.
Published: (2025)

Learning reveals invisible structure in low-rank RNNs
by: Ger, Yoav, et al.
Published: (2026)

Thoth: Mid-Training Bridges LLMs to Time Series Understanding
by: Lin, Jiafeng, et al.
Published: (2026)

A Hybrid Adaptive Velocity Aided Navigation Filter with Application to INS/DVL Fusion
by: Or, Barak, et al.
Published: (2022)

Recent Trends in Modelling the Continuous Time Series using Deep Learning: A Survey
by: Habiba, Mansura, et al.
Published: (2024)

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
by: Wu, Runzhe, et al.
Published: (2025)

Test-Time Training on Graphs with Large Language Models (LLMs)
by: Zhang, Jiaxin, et al.
Published: (2024)

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
by: Yang, Kai, et al.
Published: (2025)