:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Yang, Wang, Yi, Huang, Hengguan, Wang, Hao
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2412.17626
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On Calibration of LLM-based Guard Models for Reliable Content Moderation
by: Liu, Hongfu, et al.
Published: (2024)

BayesAgent: Bayesian Agentic Reasoning Under Uncertainty via Verbalized Probabilistic Graphical Modeling
by: Huang, Hengguan, et al.
Published: (2024)

Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs
by: Harshavardhan
Published: (2026)

Eliminating Position Bias of Language Models: A Mechanistic Approach
by: Wang, Ziqi, et al.
Published: (2024)

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
by: Shi, Hengyu, et al.
Published: (2026)

Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
by: Zhang, Yifan, et al.
Published: (2025)

Tracking Equivalent Mechanistic Interpretations Across Neural Networks
by: Sun, Alan, et al.
Published: (2026)

Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units
by: Chen, Jianhui, et al.
Published: (2026)

Linear Dynamics in the RLVR Training of Large Language Models
by: Wang, Tianle, et al.
Published: (2026)

How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
by: Huang, Ruiquan, et al.
Published: (2025)

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
by: Chen, Siyu, et al.
Published: (2024)

Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern
by: Tang, Hongyin, et al.
Published: (2024)

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
by: Zhu, Zhaowei, et al.
Published: (2023)

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
by: Zhong, Shuzhang, et al.
Published: (2024)

Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025)

Iteration Head: A Mechanistic Study of Chain-of-Thought
by: Cabannes, Vivien, et al.
Published: (2024)

Understanding the planning of LLM agents: A survey
by: Huang, Xu, et al.
Published: (2024)

On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)

Training-free LLM Merging for Multi-task Learning
by: Fu, Zichuan, et al.
Published: (2025)

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
by: Xu, Ran, et al.
Published: (2026)

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
by: Jain, Samyak, et al.
Published: (2024)

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training
by: Zhu, Dingwei, et al.
Published: (2026)

On Mechanistic Circuits for Extractive Question-Answering
by: Basu, Samyadeep, et al.
Published: (2025)

Composite Active Learning: Towards Multi-Domain Active Learning with Theoretical Guarantees
by: Hao, Guang-Yuan, et al.
Published: (2024)

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
by: Ma, Chi, et al.
Published: (2024)

Consolidating Rewarded Perturbations for LLM Post-Training
by: Zhang, Zheyu, et al.
Published: (2026)

Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
by: Yang, Hongru, et al.
Published: (2024)

Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs
by: Song, Xiangchen, et al.
Published: (2025)

Simple Mechanistic Explanations for Out-Of-Context Reasoning
by: Wang, Atticus, et al.
Published: (2025)

DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
by: Wang, Xu, et al.
Published: (2026)

Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
by: Eshuijs, Leon, et al.
Published: (2025)

Training Proactive and Personalized LLM Agents
by: Sun, Weiwei, et al.
Published: (2025)

Improving Autoregressive Training with Dynamic Oracles
by: Yang, Jianing, et al.
Published: (2024)

Prompt Curriculum Learning for Efficient LLM Post-Training
by: Gao, Zhaolin, et al.
Published: (2025)

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
by: Guo, Phillip, et al.
Published: (2024)

Learning Dynamics of LLM Finetuning
by: Ren, Yi, et al.
Published: (2024)

Tracking Universal Features Through Fine-Tuning and Model Merging
by: Horn, Niels, et al.
Published: (2024)

Offline Reinforcement Learning for LLM Multi-Step Reasoning
by: Wang, Huaijie, et al.
Published: (2024)

LLM Assertiveness can be Mechanistically Decomposed into Emotional and Logical Components
by: Tsujimura, Hikaru, et al.
Published: (2025)

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)