Saved in:
| Main Author: | Larsen, Erik |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.12066 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025)
by: Fadli, Samih
Published: (2025)
AMEL: Accumulated Message Effects on LLM Judgments
by: Temkit, Sid-Ali
Published: (2026)
by: Temkit, Sid-Ali
Published: (2026)
TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning
by: Pan, Muyu, et al.
Published: (2026)
by: Pan, Muyu, et al.
Published: (2026)
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
by: Xu, Shuyao, et al.
Published: (2025)
by: Xu, Shuyao, et al.
Published: (2025)
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
by: Cho, Seonglae, et al.
Published: (2025)
by: Cho, Seonglae, et al.
Published: (2025)
LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance
by: Shi, Jack Wei Lun, et al.
Published: (2026)
by: Shi, Jack Wei Lun, et al.
Published: (2026)
Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation
by: Merrill, Scott, et al.
Published: (2025)
by: Merrill, Scott, et al.
Published: (2025)
Perturbation Dose Responses in Recursive LLM Loops: Raw Switching, Stochastic Floors, and Persistent Escape under Append, Replace, and Dialog Updates
by: Kaplanski, Pawel
Published: (2026)
by: Kaplanski, Pawel
Published: (2026)
Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning
by: Adapala, Sai Teja Reddy
Published: (2025)
by: Adapala, Sai Teja Reddy
Published: (2025)
Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features
by: Cho, Seonglae, et al.
Published: (2026)
by: Cho, Seonglae, et al.
Published: (2026)
In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification
by: Liu, Ming
Published: (2026)
by: Liu, Ming
Published: (2026)
No Free Swap: Protocol-Dependent Layer Redundancy in Transformers
by: Garcia, Gabriel
Published: (2026)
by: Garcia, Gabriel
Published: (2026)
Beyond Pass@k: Breadth-Depth Metrics for Reasoning Boundaries
by: Dragoi, Marius, et al.
Published: (2025)
by: Dragoi, Marius, et al.
Published: (2025)
Revisiting Intermediate-Layer Matching in Knowledge Distillation: Layer-Selection Strategy Doesn't Matter (Much)
by: Yu, Zony, et al.
Published: (2025)
by: Yu, Zony, et al.
Published: (2025)
Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models
by: Cui, Sasha, et al.
Published: (2025)
by: Cui, Sasha, et al.
Published: (2025)
The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies
by: Garcia, Gabriel
Published: (2026)
by: Garcia, Gabriel
Published: (2026)
Counterfactual Likelihood Tests for Indirect Influence in Private Reasoning Channels
by: Lorup, Alexander Boesgaard
Published: (2026)
by: Lorup, Alexander Boesgaard
Published: (2026)
Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
by: Kumar, Sachin
Published: (2026)
by: Kumar, Sachin
Published: (2026)
Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text
by: Zhou, Tianyang, et al.
Published: (2026)
by: Zhou, Tianyang, et al.
Published: (2026)
Prototype Transformer: Towards Language Model Architectures Interpretable by Design
by: Yordanov, Yordan, et al.
Published: (2026)
by: Yordanov, Yordan, et al.
Published: (2026)
Forget Attention: Importance-Aware Attention Is All You Need
by: Shin, Soohyeong, et al.
Published: (2026)
by: Shin, Soohyeong, et al.
Published: (2026)
Enhancing Burmese News Classification with Kolmogorov-Arnold Network Head Fine-tuning
by: Aung, Thura, et al.
Published: (2025)
by: Aung, Thura, et al.
Published: (2025)
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
by: Lan, Guangchen, et al.
Published: (2025)
by: Lan, Guangchen, et al.
Published: (2025)
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
by: Zhang, Gongbo, et al.
Published: (2026)
by: Zhang, Gongbo, et al.
Published: (2026)
Dodo: Dynamic Contextual Compression for Decoder-only LMs
by: Qin, Guanghui, et al.
Published: (2023)
by: Qin, Guanghui, et al.
Published: (2023)
Model Collapse as Cultural Evolution
by: Guo, Dongxin, et al.
Published: (2026)
by: Guo, Dongxin, et al.
Published: (2026)
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
by: Lyu, Bohan, et al.
Published: (2024)
by: Lyu, Bohan, et al.
Published: (2024)
EasyMath: A 0-shot Math Benchmark for SLMs
by: Karki, Drishya, et al.
Published: (2025)
by: Karki, Drishya, et al.
Published: (2025)
DPO Unchained: Your Training Algorithm is Secretly Disentangled in Human Choice Theory
by: Zhou, Wenxuan, et al.
Published: (2025)
by: Zhou, Wenxuan, et al.
Published: (2025)
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
by: Lan, Guangchen, et al.
Published: (2025)
by: Lan, Guangchen, et al.
Published: (2025)
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
by: Zhang, Liangliang, et al.
Published: (2025)
by: Zhang, Liangliang, et al.
Published: (2025)
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning
by: Cho, Hanjun, et al.
Published: (2026)
by: Cho, Hanjun, et al.
Published: (2026)
Beyond Hallucinations: A Composite Score for Measuring Reliability in Open-Source Large Language Models
by: Salla, Rohit Kumar, et al.
Published: (2025)
by: Salla, Rohit Kumar, et al.
Published: (2025)
Language as a Wave Phenomenon: Semantic Phase Locking and Interference in Neural Networks
by: Yıldırım, Alper, et al.
Published: (2025)
by: Yıldırım, Alper, et al.
Published: (2025)
Alternating Reinforcement Learning with Contextual Rubric Rewards: Beyond the Scalarization Strategy
by: Lan, Guangchen, et al.
Published: (2026)
by: Lan, Guangchen, et al.
Published: (2026)
Graph Memory Transformer (GMT)
by: Zanarini, Nicola, et al.
Published: (2026)
by: Zanarini, Nicola, et al.
Published: (2026)
Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field
by: Kerner, Tobias
Published: (2024)
by: Kerner, Tobias
Published: (2024)
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
by: Guo, Dongxin, et al.
Published: (2026)
by: Guo, Dongxin, et al.
Published: (2026)
Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies
by: Liu, Ming
Published: (2026)
by: Liu, Ming
Published: (2026)
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
by: Liu, Ming
Published: (2026)
by: Liu, Ming
Published: (2026)
Similar Items
-
Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025) -
AMEL: Accumulated Message Effects on LLM Judgments
by: Temkit, Sid-Ali
Published: (2026) -
TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning
by: Pan, Muyu, et al.
Published: (2026) -
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
by: Xu, Shuyao, et al.
Published: (2025) -
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
by: Cho, Seonglae, et al.
Published: (2025)