:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Harma, Simla Burcu, Chakraborty, Ayan, Sperry, Nicholas, Falsafi, Babak, Jaggi, Martin, Oh, Yunho
Format:	Preprint
Published:	2022
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2211.10737
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Effective Interplay between Sparsity and Quantization: From Theory to Practice
by: Harma, Simla Burcu, et al.
Published: (2024)

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
by: Hägele, Alexander, et al.
Published: (2024)

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
by: Kosson, Atli, et al.
Published: (2024)

MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
by: Leon, Vasileios, et al.
Published: (2025)

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
by: Yoon, Daegun, et al.
Published: (2023)

Towards Fully FP8 GEMM LLM Training at Scale
by: Hernández-Cano, Alejandro, et al.
Published: (2025)

Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
by: Dremov, Aleksandr, et al.
Published: (2025)

'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators
by: Han, Ruichi, et al.
Published: (2026)

Empirical Capacity Model for Self-Attention Neural Networks
by: Härmä, Aki, et al.
Published: (2024)

4-bit Shampoo for Memory-Efficient Network Training
by: Wang, Sike, et al.
Published: (2024)

Iterative Assessment and Improvement of DNN Operational Accuracy
by: Guerriero, Antonio, et al.
Published: (2023)

KL for a KL: On-Policy Distillation with Control Variate Baseline
by: Oh, Minjae, et al.
Published: (2026)

Transfer Learning for Temporal Link Prediction
by: Chatterjee, Ayan, et al.
Published: (2025)

Stochastic Difference-of-Convex Optimization with Momentum
by: Chayti, El Mahdi, et al.
Published: (2025)

A Split-Client Approach to Second-Order Optimization
by: Chayti, El Mahdi, et al.
Published: (2025)

A New First-Order Meta-Learning Algorithm with Convergence Guarantees
by: Chayti, El Mahdi, et al.
Published: (2024)

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
by: Chmiel, Brian, et al.
Published: (2021)

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
by: Zhou, Xinyu, et al.
Published: (2024)

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
by: Kosson, Atli, et al.
Published: (2023)

Benchmarking Optimizers for Large Language Model Pretraining
by: Semenov, Andrei, et al.
Published: (2025)

Deep Grokking: Would Deep Neural Networks Generalize Better?
by: Fan, Simin, et al.
Published: (2024)

DNN Modularization via Activation-Driven Training
by: Ngo, Tuan, et al.
Published: (2024)

How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
by: Sengupta, Ayan, et al.
Published: (2025)

EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector Operations
by: Choi, Sangun, et al.
Published: (2025)

Personalized Collaborative Fine-Tuning for On-Device Large Language Models
by: Wagner, Nicolas, et al.
Published: (2024)

Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
by: Messmer, Bettina, et al.
Published: (2025)

Gradient-Normalized Smoothness for Optimization with Approximate Hessians
by: Semenov, Andrei, et al.
Published: (2025)

CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
by: Mohtashami, Amirkeivan, et al.
Published: (2023)

On Expressive Power of Quantized Neural Networks under Fixed-Point Arithmetic
by: Park, Yeachan, et al.
Published: (2024)

Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier
by: Jin, Gaojie, et al.
Published: (2025)

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States
by: Choi, Yunho, et al.
Published: (2026)

Rethinking the Potential of Layer Freezing for Efficient DNN Training
by: Yang, Chence, et al.
Published: (2025)

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining
by: Fan, Simin, et al.
Published: (2025)

MLTCP: Congestion Control for DNN Training
by: Rajasekaran, Sudarsanan, et al.
Published: (2024)

Float8@2bits: Entropy Coding Enables Data-Free Model Compression
by: Putzky, Patrick, et al.
Published: (2026)

DoGE: Domain Reweighting with Generalization Estimation
by: Fan, Simin, et al.
Published: (2023)

CoBo: Collaborative Learning via Bilevel Optimization
by: Hashemi, Diba, et al.
Published: (2024)

Towards an empirical understanding of MoE design choices
by: Fan, Dongyang, et al.
Published: (2024)

Using Machine Learning for move sequence visualization and generation in climbing
by: Rimbot, Thomas, et al.
Published: (2025)

Persona-aware Generative Model for Code-mixed Language
by: Sengupta, Ayan, et al.
Published: (2023)