Saved in:
| Main Authors: | Kristiansen, Gus, Sandler, Mark, Zhmoginov, Andrey, Miller, Nolan, Goyal, Anirudh, Lee, Jihwan, Vladymyrov, Max |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.09310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Contextually Guided Transformers via Low-Rank Adaptation
by: Zhmoginov, Andrey, et al.
Published: (2025)
by: Zhmoginov, Andrey, et al.
Published: (2025)
Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning
by: Vladymyrov, Max, et al.
Published: (2023)
by: Vladymyrov, Max, et al.
Published: (2023)
Long Context In-Context Compression by Getting to the Gist of Gisting
by: Petrov, Aleksandar, et al.
Published: (2025)
by: Petrov, Aleksandar, et al.
Published: (2025)
Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones
by: Zhmoginov, Andrey, et al.
Published: (2025)
by: Zhmoginov, Andrey, et al.
Published: (2025)
Learning and Unlearning of Fabricated Knowledge in Language Models
by: Sun, Chen, et al.
Published: (2024)
by: Sun, Chen, et al.
Published: (2024)
Linear Transformers are Versatile In-Context Learners
by: Vladymyrov, Max, et al.
Published: (2024)
by: Vladymyrov, Max, et al.
Published: (2024)
How new data permeates LLM knowledge and how to dilute it
by: Sun, Chen, et al.
Published: (2025)
by: Sun, Chen, et al.
Published: (2025)
Uncovering mesa-optimization algorithms in Transformers
by: von Oswald, Johannes, et al.
Published: (2023)
by: von Oswald, Johannes, et al.
Published: (2023)
Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
by: Subramanyam, Anirudh, et al.
Published: (2025)
by: Subramanyam, Anirudh, et al.
Published: (2025)
MELODI: Exploring Memory Compression for Long Contexts
by: Chen, Yinpeng, et al.
Published: (2024)
by: Chen, Yinpeng, et al.
Published: (2024)
Non-Convex Optimization with Spectral Radius Regularization
by: Sandler, Adam, et al.
Published: (2021)
by: Sandler, Adam, et al.
Published: (2021)
DP-Muon: Differentially Private Optimization via Matrix-Orthogonalized Momentum
by: Kim, Jihwan, et al.
Published: (2026)
by: Kim, Jihwan, et al.
Published: (2026)
Can Models Learn Skill Composition from Examples?
by: Zhao, Haoyu, et al.
Published: (2024)
by: Zhao, Haoyu, et al.
Published: (2024)
Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates
by: Diaz, Rodrigo, et al.
Published: (2025)
by: Diaz, Rodrigo, et al.
Published: (2025)
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
by: Goyal, Sachin, et al.
Published: (2025)
by: Goyal, Sachin, et al.
Published: (2025)
On the Impossibility of Retrain Equivalence in Machine Unlearning
by: Yu, Jiatong, et al.
Published: (2025)
by: Yu, Jiatong, et al.
Published: (2025)
LLM Agents for Bargaining with Utility-based Feedback
by: Oh, Jihwan
Published: (2025)
by: Oh, Jihwan
Published: (2025)
Heterogeneous Federated Learning with Prototype Alignment and Upscaling
by: Lee, Gyuejeong, et al.
Published: (2025)
by: Lee, Gyuejeong, et al.
Published: (2025)
Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods
by: Diaz, Rodrigo, et al.
Published: (2024)
by: Diaz, Rodrigo, et al.
Published: (2024)
Faraday: Synthetic Smart Meter Generator for the smart grid
by: Chai, Sheng, et al.
Published: (2024)
by: Chai, Sheng, et al.
Published: (2024)
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
by: Kaur, Simran, et al.
Published: (2024)
by: Kaur, Simran, et al.
Published: (2024)
$α$-TCVAE: On the relationship between Disentanglement and Diversity
by: Meo, Cristian, et al.
Published: (2024)
by: Meo, Cristian, et al.
Published: (2024)
Improving Sparse Memory Finetuning
by: Goyal, Satyam, et al.
Published: (2026)
by: Goyal, Satyam, et al.
Published: (2026)
Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
by: Didolkar, Aniket, et al.
Published: (2025)
by: Didolkar, Aniket, et al.
Published: (2025)
Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning
by: Gupta, Prakhar, et al.
Published: (2026)
by: Gupta, Prakhar, et al.
Published: (2026)
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
by: Prabhudesai, Mihir, et al.
Published: (2023)
by: Prabhudesai, Mihir, et al.
Published: (2023)
Latent Diffusion Pretraining for Crystal Property Prediction
by: Mukherjee, Shrimon, et al.
Published: (2026)
by: Mukherjee, Shrimon, et al.
Published: (2026)
Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models
by: Dang, Xingyu, et al.
Published: (2026)
by: Dang, Xingyu, et al.
Published: (2026)
ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models
by: Anand, Nikhil, et al.
Published: (2026)
by: Anand, Nikhil, et al.
Published: (2026)
Partial Inverse Design of High-Performance Concrete Using Cooperative Neural Networks for Constraint-Aware Mix Generation
by: Nugraha, Agung, et al.
Published: (2025)
by: Nugraha, Agung, et al.
Published: (2025)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective
by: Cha, Sungmin, et al.
Published: (2022)
by: Cha, Sungmin, et al.
Published: (2022)
Evaluation of Neural Surrogates for Physical Modelling Synthesis of Nonlinear Elastic Plates
by: Martin, Carlos De La Vega, et al.
Published: (2025)
by: Martin, Carlos De La Vega, et al.
Published: (2025)
Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
When Should We Introduce Safety Interventions During Pretraining?
by: Sam, Dylan, et al.
Published: (2026)
by: Sam, Dylan, et al.
Published: (2026)
BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames
by: Mark, Max Sobol, et al.
Published: (2026)
by: Mark, Max Sobol, et al.
Published: (2026)
ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning
by: Lee, Hosung, et al.
Published: (2024)
by: Lee, Hosung, et al.
Published: (2024)
A Granular Study of Safety Pretraining under Model Abliteration
by: Agnihotri, Shashank, et al.
Published: (2025)
by: Agnihotri, Shashank, et al.
Published: (2025)
Robust and Consistent Ski Rental with Distributional Advice
by: Kim, Jihwan, et al.
Published: (2026)
by: Kim, Jihwan, et al.
Published: (2026)
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
by: Watts, Ishaan, et al.
Published: (2026)
by: Watts, Ishaan, et al.
Published: (2026)
Benchmarking Optimizers for Large Language Model Pretraining
by: Semenov, Andrei, et al.
Published: (2025)
by: Semenov, Andrei, et al.
Published: (2025)
Similar Items
-
Contextually Guided Transformers via Low-Rank Adaptation
by: Zhmoginov, Andrey, et al.
Published: (2025) -
Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning
by: Vladymyrov, Max, et al.
Published: (2023) -
Long Context In-Context Compression by Getting to the Gist of Gisting
by: Petrov, Aleksandar, et al.
Published: (2025) -
Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones
by: Zhmoginov, Andrey, et al.
Published: (2025) -
Learning and Unlearning of Fabricated Knowledge in Language Models
by: Sun, Chen, et al.
Published: (2024)