:: Library Catalog

Obálka

Uloženo v:

Podrobná bibliografie
Hlavní autoři:	Vyas, Nikhil, Morwani, Depen, Zhao, Rosie, Kwun, Mujin, Shapira, Itai, Brandfonbrener, David, Janson, Lucas, Kakade, Sham
Médium:	Preprint
Vydáno:	2024
Témata:	Machine Learning Artificial Intelligence
On-line přístup:	https://arxiv.org/abs/2409.11321
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Podobné jednotky

Deconstructing What Makes a Good Optimizer for Language Models
Autor: Zhao, Rosie, a další
Vydáno: (2024)

A New Perspective on Shampoo's Preconditioner
Autor: Morwani, Depen, a další
Vydáno: (2024)

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Autor: Morwani, Depen, a další
Vydáno: (2025)

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Autor: Abreu, Natalie, a další
Vydáno: (2025)

LOTION: Smoothing the Optimization Landscape for Quantized Training
Autor: Kwun, Mujin, a další
Vydáno: (2025)

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Autor: Vyas, Nikhil, a další
Vydáno: (2023)

Loss-to-Loss Prediction: Scaling Laws for All Datasets
Autor: Brandfonbrener, David, a další
Vydáno: (2024)

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding
Autor: Oncescu, Costin-Andrei, a další
Vydáno: (2026)

How Does Critical Batch Size Scale in Pre-training?
Autor: Zhang, Hanlin, a další
Vydáno: (2024)

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
Autor: Liu, Bingbin, a další
Vydáno: (2025)

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging
Autor: Meterez, Alexandru, a další
Vydáno: (2026)

Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling
Autor: Meterez, Alexandru, a další
Vydáno: (2025)

Feature emergence via margin maximization: case studies in algebraic tasks
Autor: Morwani, Depen, a další
Vydáno: (2023)

Repeat After Me: Transformers are Better than State Space Models at Copying
Autor: Jelassi, Samy, a další
Vydáno: (2024)

Characterization and Mitigation of Training Instabilities in Microscaling Formats
Autor: Su, Huangyuan, a další
Vydáno: (2025)

CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training
Autor: Brandfonbrener, David, a další
Vydáno: (2024)

The Role of Sparsity for Length Generalization in Transformers
Autor: Golowich, Noah, a další
Vydáno: (2025)

A Simplified Analysis of SGD for Linear Regression with Weight Averaging
Autor: Meterez, Alexandru, a další
Vydáno: (2025)

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
Autor: Jelassi, Samy, a další
Vydáno: (2026)

GQ-VAE: A gated quantized VAE for learning variable length tokens
Autor: Datta, Theo, a další
Vydáno: (2025)

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Autor: Qin, Tian, a další
Vydáno: (2025)

Learning Hidden Markov Models Using Conditional Samples
Autor: Kakade, Sham M., a další
Vydáno: (2023)

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Autor: Oncescu, Costin-Andrei, a další
Vydáno: (2024)

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities
Autor: Zhang, Hanlin, a další
Vydáno: (2026)

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Autor: Jin, Jikai, a další
Vydáno: (2025)

Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Autor: Eschenhagen, Runa, a další
Vydáno: (2025)

Universal Length Generalization with Turing Programs
Autor: Hou, Kaiying, a další
Vydáno: (2024)

Mixture of Parrots: Experts improve memorization more than reasoning
Autor: Jelassi, Samy, a další
Vydáno: (2024)

Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization
Autor: Lin, Wu, a další
Vydáno: (2025)

Soup to go: mitigating forgetting during continual learning with model averaging
Autor: Kleiman, Anat, a další
Vydáno: (2025)

Pairwise Calibrated Rewards for Pluralistic Alignment
Autor: Halpern, Daniel, a další
Vydáno: (2025)

Scaling Laws for Imitation Learning in Single-Agent Games
Autor: Tuyls, Jens, a další
Vydáno: (2023)

Cognitive models can reveal interpretable value trade-offs in language models
Autor: Murthy, Sonia K., a další
Vydáno: (2025)

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Autor: Qi, Zhenting, a další
Vydáno: (2024)

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Autor: Li, Kenneth, a další
Vydáno: (2024)

Skin-SOAP: A Weakly Supervised Framework for Generating Structured SOAP Notes
Autor: Kamal, Sadia, a další
Vydáno: (2025)

Scaling Laws in Linear Regression: Compute, Parameters, and Data
Autor: Lin, Licong, a další
Vydáno: (2024)

4-bit Shampoo for Memory-Efficient Network Training
Autor: Wang, Sike, a další
Vydáno: (2024)

Clarifying Shampoo: Adapting Spectral Descent to Stochasticity and the Parameter Trajectory
Autor: Eschenhagen, Runa, a další
Vydáno: (2026)

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
Autor: Sun, Ruotong, a další
Vydáno: (2026)