:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kristiansen, Gus, Sandler, Mark, Zhmoginov, Andrey, Miller, Nolan, Goyal, Anirudh, Lee, Jihwan, Vladymyrov, Max
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2408.09310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Contextually Guided Transformers via Low-Rank Adaptation
by: Zhmoginov, Andrey, et al.
Published: (2025)

Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning
by: Vladymyrov, Max, et al.
Published: (2023)

Long Context In-Context Compression by Getting to the Gist of Gisting
by: Petrov, Aleksandar, et al.
Published: (2025)

Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones
by: Zhmoginov, Andrey, et al.
Published: (2025)

Learning and Unlearning of Fabricated Knowledge in Language Models
by: Sun, Chen, et al.
Published: (2024)

Linear Transformers are Versatile In-Context Learners
by: Vladymyrov, Max, et al.
Published: (2024)

How new data permeates LLM knowledge and how to dilute it
by: Sun, Chen, et al.
Published: (2025)

Uncovering mesa-optimization algorithms in Transformers
by: von Oswald, Johannes, et al.
Published: (2023)

Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
by: Subramanyam, Anirudh, et al.
Published: (2025)

MELODI: Exploring Memory Compression for Long Contexts
by: Chen, Yinpeng, et al.
Published: (2024)

Non-Convex Optimization with Spectral Radius Regularization
by: Sandler, Adam, et al.
Published: (2021)

DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum
by: Kim, Jihwan, et al.
Published: (2026)

Can Models Learn Skill Composition from Examples?
by: Zhao, Haoyu, et al.
Published: (2024)

Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates
by: Diaz, Rodrigo, et al.
Published: (2025)

Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
by: Goyal, Sachin, et al.
Published: (2025)

On the Impossibility of Retrain Equivalence in Machine Unlearning
by: Yu, Jiatong, et al.
Published: (2025)

LLM Agents for Bargaining with Utility-based Feedback
by: Oh, Jihwan
Published: (2025)

Heterogeneous Federated Learning with Prototype Alignment and Upscaling
by: Lee, Gyuejeong, et al.
Published: (2025)

Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods
by: Diaz, Rodrigo, et al.
Published: (2024)

Faraday: Synthetic Smart Meter Generator for the smart grid
by: Chai, Sheng, et al.
Published: (2024)

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
by: Kaur, Simran, et al.
Published: (2024)

$α$-TCVAE: On the relationship between Disentanglement and Diversity
by: Meo, Cristian, et al.
Published: (2024)

Improving Sparse Memory Finetuning
by: Goyal, Satyam, et al.
Published: (2026)

Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
by: Didolkar, Aniket, et al.
Published: (2025)

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning
by: Gupta, Prakhar, et al.
Published: (2026)

Aligning Text-to-Image Diffusion Models with Reward Backpropagation
by: Prabhudesai, Mihir, et al.
Published: (2023)

Latent Diffusion Pretraining for Crystal Property Prediction
by: Mukherjee, Shrimon, et al.
Published: (2026)

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models
by: Dang, Xingyu, et al.
Published: (2026)

ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models
by: Anand, Nikhil, et al.
Published: (2026)

Partial Inverse Design of High-Performance Concrete Using Cooperative Neural Networks for Constraint-Aware Mix Generation
by: Nugraha, Agung, et al.
Published: (2025)

Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective
by: Cha, Sungmin, et al.
Published: (2022)

Evaluation of Neural Surrogates for Physical Modelling Synthesis of Nonlinear Elastic Plates
by: Martin, Carlos De La Vega, et al.
Published: (2025)

Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)

When Should We Introduce Safety Interventions During Pretraining?
by: Sam, Dylan, et al.
Published: (2026)

BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
by: Mark, Max Sobol, et al.
Published: (2026)

ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning
by: Lee, Hosung, et al.
Published: (2024)

A Granular Study of Safety Pretraining under Model Abliteration
by: Agnihotri, Shashank, et al.
Published: (2025)

Robust and Consistent Ski Rental with Distributional Advice
by: Kim, Jihwan, et al.
Published: (2026)

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
by: Watts, Ishaan, et al.
Published: (2026)

Benchmarking Optimizers for Large Language Model Pretraining
by: Semenov, Andrei, et al.
Published: (2025)