Saved in:
| Main Authors: | Harma, Simla Burcu, Chakraborty, Ayan, Sperry, Nicholas, Falsafi, Babak, Jaggi, Martin, Oh, Yunho |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2211.10737 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Effective Interplay between Sparsity and Quantization: From Theory to Practice
by: Harma, Simla Burcu, et al.
Published: (2024)
by: Harma, Simla Burcu, et al.
Published: (2024)
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
by: Hägele, Alexander, et al.
Published: (2024)
by: Hägele, Alexander, et al.
Published: (2024)
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
by: Kosson, Atli, et al.
Published: (2024)
by: Kosson, Atli, et al.
Published: (2024)
MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
by: Leon, Vasileios, et al.
Published: (2025)
by: Leon, Vasileios, et al.
Published: (2025)
MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
by: Yoon, Daegun, et al.
Published: (2023)
by: Yoon, Daegun, et al.
Published: (2023)
Towards Fully FP8 GEMM LLM Training at Scale
by: Hernández-Cano, Alejandro, et al.
Published: (2025)
by: Hernández-Cano, Alejandro, et al.
Published: (2025)
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
by: Dremov, Aleksandr, et al.
Published: (2025)
by: Dremov, Aleksandr, et al.
Published: (2025)
'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators
by: Han, Ruichi, et al.
Published: (2026)
by: Han, Ruichi, et al.
Published: (2026)
Empirical Capacity Model for Self-Attention Neural Networks
by: Härmä, Aki, et al.
Published: (2024)
by: Härmä, Aki, et al.
Published: (2024)
4-bit Shampoo for Memory-Efficient Network Training
by: Wang, Sike, et al.
Published: (2024)
by: Wang, Sike, et al.
Published: (2024)
Iterative Assessment and Improvement of DNN Operational Accuracy
by: Guerriero, Antonio, et al.
Published: (2023)
by: Guerriero, Antonio, et al.
Published: (2023)
KL for a KL: On-Policy Distillation with Control Variate Baseline
by: Oh, Minjae, et al.
Published: (2026)
by: Oh, Minjae, et al.
Published: (2026)
Transfer Learning for Temporal Link Prediction
by: Chatterjee, Ayan, et al.
Published: (2025)
by: Chatterjee, Ayan, et al.
Published: (2025)
Stochastic Difference-of-Convex Optimization with Momentum
by: Chayti, El Mahdi, et al.
Published: (2025)
by: Chayti, El Mahdi, et al.
Published: (2025)
A Split-Client Approach to Second-Order Optimization
by: Chayti, El Mahdi, et al.
Published: (2025)
by: Chayti, El Mahdi, et al.
Published: (2025)
A New First-Order Meta-Learning Algorithm with Convergence Guarantees
by: Chayti, El Mahdi, et al.
Published: (2024)
by: Chayti, El Mahdi, et al.
Published: (2024)
Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
by: Chmiel, Brian, et al.
Published: (2021)
by: Chmiel, Brian, et al.
Published: (2021)
HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation
by: Zhou, Xinyu, et al.
Published: (2024)
by: Zhou, Xinyu, et al.
Published: (2024)
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
by: Kosson, Atli, et al.
Published: (2023)
by: Kosson, Atli, et al.
Published: (2023)
Benchmarking Optimizers for Large Language Model Pretraining
by: Semenov, Andrei, et al.
Published: (2025)
by: Semenov, Andrei, et al.
Published: (2025)
Deep Grokking: Would Deep Neural Networks Generalize Better?
by: Fan, Simin, et al.
Published: (2024)
by: Fan, Simin, et al.
Published: (2024)
DNN Modularization via Activation-Driven Training
by: Ngo, Tuan, et al.
Published: (2024)
by: Ngo, Tuan, et al.
Published: (2024)
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
by: Sengupta, Ayan, et al.
Published: (2025)
by: Sengupta, Ayan, et al.
Published: (2025)
EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector Operations
by: Choi, Sangun, et al.
Published: (2025)
by: Choi, Sangun, et al.
Published: (2025)
Personalized Collaborative Fine-Tuning for On-Device Large Language Models
by: Wagner, Nicolas, et al.
Published: (2024)
by: Wagner, Nicolas, et al.
Published: (2024)
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
by: Messmer, Bettina, et al.
Published: (2025)
by: Messmer, Bettina, et al.
Published: (2025)
Gradient-Normalized Smoothness for Optimization with Approximate Hessians
by: Semenov, Andrei, et al.
Published: (2025)
by: Semenov, Andrei, et al.
Published: (2025)
CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
by: Mohtashami, Amirkeivan, et al.
Published: (2023)
by: Mohtashami, Amirkeivan, et al.
Published: (2023)
On Expressive Power of Quantized Neural Networks under Fixed-Point Arithmetic
by: Park, Yeachan, et al.
Published: (2024)
by: Park, Yeachan, et al.
Published: (2024)
Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier
by: Jin, Gaojie, et al.
Published: (2025)
by: Jin, Gaojie, et al.
Published: (2025)
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States
by: Choi, Yunho, et al.
Published: (2026)
by: Choi, Yunho, et al.
Published: (2026)
Rethinking the Potential of Layer Freezing for Efficient DNN Training
by: Yang, Chence, et al.
Published: (2025)
by: Yang, Chence, et al.
Published: (2025)
GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining
by: Fan, Simin, et al.
Published: (2025)
by: Fan, Simin, et al.
Published: (2025)
MLTCP: Congestion Control for DNN Training
by: Rajasekaran, Sudarsanan, et al.
Published: (2024)
by: Rajasekaran, Sudarsanan, et al.
Published: (2024)
Float8@2bits: Entropy Coding Enables Data-Free Model Compression
by: Putzky, Patrick, et al.
Published: (2026)
by: Putzky, Patrick, et al.
Published: (2026)
DoGE: Domain Reweighting with Generalization Estimation
by: Fan, Simin, et al.
Published: (2023)
by: Fan, Simin, et al.
Published: (2023)
CoBo: Collaborative Learning via Bilevel Optimization
by: Hashemi, Diba, et al.
Published: (2024)
by: Hashemi, Diba, et al.
Published: (2024)
Towards an empirical understanding of MoE design choices
by: Fan, Dongyang, et al.
Published: (2024)
by: Fan, Dongyang, et al.
Published: (2024)
Using Machine Learning for move sequence visualization and generation in climbing
by: Rimbot, Thomas, et al.
Published: (2025)
by: Rimbot, Thomas, et al.
Published: (2025)
Persona-aware Generative Model for Code-mixed Language
by: Sengupta, Ayan, et al.
Published: (2023)
by: Sengupta, Ayan, et al.
Published: (2023)
Similar Items
-
Effective Interplay between Sparsity and Quantization: From Theory to Practice
by: Harma, Simla Burcu, et al.
Published: (2024) -
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
by: Hägele, Alexander, et al.
Published: (2024) -
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
by: Kosson, Atli, et al.
Published: (2024) -
MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators
by: Leon, Vasileios, et al.
Published: (2025) -
MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
by: Yoon, Daegun, et al.
Published: (2023)