:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lubana, Ekdeep Singh, Kawaguchi, Kyogo, Dick, Robert P., Tanaka, Hidenori
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.12578
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
by: Park, Core Francisco, et al.
Published: (2024)

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
by: Khona, Mikail, et al.
Published: (2024)

Emergence of Hierarchical Emotion Organization in Large Language Models
by: Zhao, Bo, et al.
Published: (2025)

In-Context Learning Dynamics with Random Binary Sequences
by: Bigelow, Eric J., et al.
Published: (2023)

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
by: Okawa, Maya, et al.
Published: (2023)

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
by: Ramesh, Rahul, et al.
Published: (2023)

In-Context Learning Strategies Emerge Rationally
by: Wurgaft, Daniel, et al.
Published: (2025)

Analyzing (In)Abilities of SAEs via Formal Languages
by: Menon, Abhinav, et al.
Published: (2024)

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
by: Jaipersaud, Brandon, et al.
Published: (2025)

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
by: Bigelow, Eric, et al.
Published: (2025)

The Impact of Off-Policy Training Data on Probe Generalisation
by: Kirch, Nathalie, et al.
Published: (2025)

ICLR: In-Context Learning of Representations
by: Park, Core Francisco, et al.
Published: (2024)

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
by: Hindupur, Sai Sumedh R., et al.
Published: (2025)

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
by: Nishi, Kento, et al.
Published: (2024)

Competition Dynamics Shape Algorithmic Phases of In-Context Learning
by: Park, Core Francisco, et al.
Published: (2024)

From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?
by: Mueller, Aaron, et al.
Published: (2025)

Abrupt Learning in Transformers: A Case Study on Matrix Completion
by: Gopalani, Pulkit, et al.
Published: (2024)

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
by: Jain, Samyak, et al.
Published: (2023)

Swing-by Dynamics in Concept Learning and Compositional Generalization
by: Yang, Yongyi, et al.
Published: (2024)

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
by: Bigelow, Eric, et al.
Published: (2026)

Continuous-Time Analysis of Adaptive Optimization and Normalization
by: Gould, Rhys, et al.
Published: (2024)

Do Sparse Autoencoders Capture Concept Manifolds?
by: Bhalla, Usha, et al.
Published: (2026)

Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
by: Bohacek, Matyas, et al.
Published: (2025)

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
by: Pres, Itamar, et al.
Published: (2024)

Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics
by: Zur, Amir, et al.
Published: (2025)

Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training
by: Mistry, Deven Mahesh, et al.
Published: (2025)

Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework
by: Xia, Yuan, et al.
Published: (2025)

$\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
by: Park, Core Francisco, et al.
Published: (2025)

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
by: Yu, Zhouliang, et al.
Published: (2025)

Forking Paths in Neural Text Generation
by: Bigelow, Eric, et al.
Published: (2024)

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
by: Yao, Jinghan, et al.
Published: (2024)

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models
by: Shi, Yubin, et al.
Published: (2024)

Analyzing Generalization in Pre-Trained Symbolic Regression
by: Voigt, Henrik, et al.
Published: (2025)

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
by: Costa, Valérie, et al.
Published: (2025)

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
by: Costa, Valérie, et al.
Published: (2025)

Marabou 2.0: A Versatile Formal Analyzer of Neural Networks
by: Wu, Haoze, et al.
Published: (2024)

Analyzing Memorization in Large Language Models through the Lens of Model Attribution
by: Menta, Tarun Ram, et al.
Published: (2025)

Towards a Formal Creativity Theory: Preliminary results in Novelty and Transformativeness
by: Santo, Luís Espírito, et al.
Published: (2024)

Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models
by: Ding, Zeyang, et al.
Published: (2026)

Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards
by: Padula, Alexander G., et al.
Published: (2024)