Saved in:
| Main Authors: | Xu, Yang, Wang, Yi, Huang, Hengguan, Wang, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.17626 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On Calibration of LLM-based Guard Models for Reliable Content Moderation
by: Liu, Hongfu, et al.
Published: (2024)
by: Liu, Hongfu, et al.
Published: (2024)
BayesAgent: Bayesian Agentic Reasoning Under Uncertainty via Verbalized Probabilistic Graphical Modeling
by: Huang, Hengguan, et al.
Published: (2024)
by: Huang, Hengguan, et al.
Published: (2024)
Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs
by: Harshavardhan
Published: (2026)
by: Harshavardhan
Published: (2026)
Eliminating Position Bias of Language Models: A Mechanistic Approach
by: Wang, Ziqi, et al.
Published: (2024)
by: Wang, Ziqi, et al.
Published: (2024)
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
by: Shi, Hengyu, et al.
Published: (2026)
by: Shi, Hengyu, et al.
Published: (2026)
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Tracking Equivalent Mechanistic Interpretations Across Neural Networks
by: Sun, Alan, et al.
Published: (2026)
by: Sun, Alan, et al.
Published: (2026)
Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units
by: Chen, Jianhui, et al.
Published: (2026)
by: Chen, Jianhui, et al.
Published: (2026)
Linear Dynamics in the RLVR Training of Large Language Models
by: Wang, Tianle, et al.
Published: (2026)
by: Wang, Tianle, et al.
Published: (2026)
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
by: Huang, Ruiquan, et al.
Published: (2025)
by: Huang, Ruiquan, et al.
Published: (2025)
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern
by: Tang, Hongyin, et al.
Published: (2024)
by: Tang, Hongyin, et al.
Published: (2024)
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
by: Zhu, Zhaowei, et al.
Published: (2023)
by: Zhu, Zhaowei, et al.
Published: (2023)
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
by: Zhong, Shuzhang, et al.
Published: (2024)
by: Zhong, Shuzhang, et al.
Published: (2024)
Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025)
by: Liu, Jingyuan, et al.
Published: (2025)
Iteration Head: A Mechanistic Study of Chain-of-Thought
by: Cabannes, Vivien, et al.
Published: (2024)
by: Cabannes, Vivien, et al.
Published: (2024)
Understanding the planning of LLM agents: A survey
by: Huang, Xu, et al.
Published: (2024)
by: Huang, Xu, et al.
Published: (2024)
On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)
by: Gao, Jiaxuan, et al.
Published: (2024)
Training-free LLM Merging for Multi-task Learning
by: Fu, Zichuan, et al.
Published: (2025)
by: Fu, Zichuan, et al.
Published: (2025)
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
by: Xu, Ran, et al.
Published: (2026)
by: Xu, Ran, et al.
Published: (2026)
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
by: Jain, Samyak, et al.
Published: (2024)
by: Jain, Samyak, et al.
Published: (2024)
DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training
by: Zhu, Dingwei, et al.
Published: (2026)
by: Zhu, Dingwei, et al.
Published: (2026)
On Mechanistic Circuits for Extractive Question-Answering
by: Basu, Samyadeep, et al.
Published: (2025)
by: Basu, Samyadeep, et al.
Published: (2025)
Composite Active Learning: Towards Multi-Domain Active Learning with Theoretical Guarantees
by: Hao, Guang-Yuan, et al.
Published: (2024)
by: Hao, Guang-Yuan, et al.
Published: (2024)
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
by: Ma, Chi, et al.
Published: (2024)
by: Ma, Chi, et al.
Published: (2024)
Consolidating Rewarded Perturbations for LLM Post-Training
by: Zhang, Zheyu, et al.
Published: (2026)
by: Zhang, Zheyu, et al.
Published: (2026)
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
by: Yang, Hongru, et al.
Published: (2024)
by: Yang, Hongru, et al.
Published: (2024)
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs
by: Song, Xiangchen, et al.
Published: (2025)
by: Song, Xiangchen, et al.
Published: (2025)
Simple Mechanistic Explanations for Out-Of-Context Reasoning
by: Wang, Atticus, et al.
Published: (2025)
by: Wang, Atticus, et al.
Published: (2025)
DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)
by: Wang, Xu, et al.
Published: (2026)
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
by: Eshuijs, Leon, et al.
Published: (2025)
by: Eshuijs, Leon, et al.
Published: (2025)
Training Proactive and Personalized LLM Agents
by: Sun, Weiwei, et al.
Published: (2025)
by: Sun, Weiwei, et al.
Published: (2025)
Improving Autoregressive Training with Dynamic Oracles
by: Yang, Jianing, et al.
Published: (2024)
by: Yang, Jianing, et al.
Published: (2024)
Prompt Curriculum Learning for Efficient LLM Post-Training
by: Gao, Zhaolin, et al.
Published: (2025)
by: Gao, Zhaolin, et al.
Published: (2025)
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
by: Guo, Phillip, et al.
Published: (2024)
by: Guo, Phillip, et al.
Published: (2024)
Learning Dynamics of LLM Finetuning
by: Ren, Yi, et al.
Published: (2024)
by: Ren, Yi, et al.
Published: (2024)
Tracking Universal Features Through Fine-Tuning and Model Merging
by: Horn, Niels, et al.
Published: (2024)
by: Horn, Niels, et al.
Published: (2024)
Offline Reinforcement Learning for LLM Multi-Step Reasoning
by: Wang, Huaijie, et al.
Published: (2024)
by: Wang, Huaijie, et al.
Published: (2024)
LLM Assertiveness can be Mechanistically Decomposed into Emotional and Logical Components
by: Tsujimura, Hikaru, et al.
Published: (2025)
by: Tsujimura, Hikaru, et al.
Published: (2025)
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)
by: Huang, Tianjin, et al.
Published: (2025)
Similar Items
-
On Calibration of LLM-based Guard Models for Reliable Content Moderation
by: Liu, Hongfu, et al.
Published: (2024) -
BayesAgent: Bayesian Agentic Reasoning Under Uncertainty via Verbalized Probabilistic Graphical Modeling
by: Huang, Hengguan, et al.
Published: (2024) -
Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs
by: Harshavardhan
Published: (2026) -
Eliminating Position Bias of Language Models: A Mechanistic Approach
by: Wang, Ziqi, et al.
Published: (2024) -
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
by: Shi, Hengyu, et al.
Published: (2026)