:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nishi, Kento, Ramesh, Rahul, Okawa, Maya, Khona, Mikail, Tanaka, Hidenori, Lubana, Ekdeep Singh
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2410.17194
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
by: Khona, Mikail, et al.
Published: (2024)

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
by: Ramesh, Rahul, et al.
Published: (2023)

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
by: Okawa, Maya, et al.
Published: (2023)

ICLR: In-Context Learning of Representations
by: Park, Core Francisco, et al.
Published: (2024)

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
by: Park, Core Francisco, et al.
Published: (2024)

Swing-by Dynamics in Concept Learning and Compositional Generalization
by: Yang, Yongyi, et al.
Published: (2024)

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
by: Lubana, Ekdeep Singh, et al.
Published: (2024)

Emergence of Hierarchical Emotion Organization in Large Language Models
by: Zhao, Bo, et al.
Published: (2025)

Competition Dynamics Shape Algorithmic Phases of In-Context Learning
by: Park, Core Francisco, et al.
Published: (2024)

Abrupt Learning in Transformers: A Case Study on Matrix Completion
by: Gopalani, Pulkit, et al.
Published: (2024)

In-Context Learning Dynamics with Random Binary Sequences
by: Bigelow, Eric J., et al.
Published: (2023)

In-Context Learning Strategies Emerge Rationally
by: Wurgaft, Daniel, et al.
Published: (2025)

In-Context Learning of Energy Functions
by: Schaeffer, Rylan, et al.
Published: (2024)

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
by: Costa, Valérie, et al.
Published: (2025)

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations
by: Jaipersaud, Brandon, et al.
Published: (2025)

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
by: Bigelow, Eric, et al.
Published: (2025)

Analyzing (In)Abilities of SAEs via Formal Languages
by: Menon, Abhinav, et al.
Published: (2024)

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
by: Jain, Samyak, et al.
Published: (2023)

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
by: Costa, Valérie, et al.
Published: (2025)

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
by: Hindupur, Sai Sumedh R., et al.
Published: (2025)

Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains
by: Tanaka, Yusuke, et al.
Published: (2022)

The Impact of Off-Policy Training Data on Probe Generalisation
by: Kirch, Nathalie, et al.
Published: (2025)

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
by: Jain, Samyak, et al.
Published: (2024)

From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?
by: Mueller, Aaron, et al.
Published: (2025)

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Frontier AI Models
by: Duan, Sunny, et al.
Published: (2024)

Detecting High-Stakes Interactions with Activation Probes
by: McKenzie, Alex, et al.
Published: (2025)

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability
by: Prasad, Aaditya Vikram, et al.
Published: (2026)

Provable Low-Frequency Bias of In-Context Learning of Representations
by: Yang, Yongyi, et al.
Published: (2025)

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
by: Bigelow, Eric, et al.
Published: (2026)

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
by: Huang, Jing, et al.
Published: (2026)

Meta-Learning for Neural Network-based Temporal Point Processes
by: Takimoto, Yoshiaki, et al.
Published: (2024)

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
by: Wurgaft, Daniel, et al.
Published: (2026)

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon
by: Liang, Tongtong, et al.
Published: (2025)

$\textit{New News}$: System-2 Fine-tuning for Robust Integration of New Knowledge
by: Park, Core Francisco, et al.
Published: (2025)

Continuous-Time Analysis of Adaptive Optimization and Normalization
by: Gould, Rhys, et al.
Published: (2024)

Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic
by: Zhao, Xingyu, et al.
Published: (2026)

Do Sparse Autoencoders Capture Concept Manifolds?
by: Bhalla, Usha, et al.
Published: (2026)

Bridging Associative Memory and Probabilistic Modeling
by: Schaeffer, Rylan, et al.
Published: (2024)

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
by: Schaeffer, Rylan, et al.
Published: (2024)

Cross-patient Seizure Onset Zone Classification by Patient-Dependent Weight
by: Zhao, Xuyang, et al.
Published: (2025)