:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Niu, Xueyan, Bai, Bo, Deng, Lei, Han, Wei
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2405.08707
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
by: Niu, Xueyan, et al.
Published: (2026)

High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models
by: Yilmaz, Selim F., et al.
Published: (2023)

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws
by: Li, Binghui, et al.
Published: (2026)

Understanding Transformer from the Perspective of Associative Memory
by: Zhong, Shu, et al.
Published: (2025)

Scaling Laws for Associative Memories
by: Cabannes, Vivien, et al.
Published: (2023)

Understanding Factual Recall in Transformers via Associative Memories
by: Nichani, Eshaan, et al.
Published: (2024)

Tensor Cache: Eviction-conditioned Associative Memory for Transformers
by: Swain, Kabir, et al.
Published: (2026)

Configuration-to-Performance Scaling Law with Neural Ansatz
by: Zhang, Huaqing, et al.
Published: (2026)

Scaling Laws for Data-Efficient Visual Transfer Learning
by: Yang, Wenxuan, et al.
Published: (2025)

A Mean Field Ansatz for Zero-Shot Weight Transfer
by: Chen, Xingyuan, et al.
Published: (2024)

Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding
by: Fei, Weizhi, et al.
Published: (2024)

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
by: Hägele, Alexander, et al.
Published: (2024)

Determinant Estimation under Memory Constraints and Neural Scaling Laws
by: Ameli, Siavash, et al.
Published: (2025)

On the Invariance and Generality of Neural Scaling Laws
by: Han, Xing, et al.
Published: (2026)

Understanding and Mitigating the Bias in Sample Selection for Learning with Noisy Labels
by: Wei, Qi, et al.
Published: (2024)

Scaling Law for Time Series Forecasting
by: Shi, Jingzhe, et al.
Published: (2024)

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction
by: Yan, Bencheng, et al.
Published: (2025)

Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
by: Han, Yunchu, et al.
Published: (2025)

Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
by: Havrilla, Alex, et al.
Published: (2024)

Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics
by: Kendiukhov, Ihor
Published: (2026)

Generalization and Scaling Laws for Mixture-of-Experts Transformers
by: Mayaki, Mansour Zoubeirou a
Published: (2026)

Empowering LLMs in Decision Games through Algorithmic Data Synthesis
by: Wang, Haolin, et al.
Published: (2025)

From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme
by: Li, Xueyan, et al.
Published: (2025)

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
by: Sardana, Nikhil, et al.
Published: (2023)

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
by: Wu, Xiaojun, et al.
Published: (2025)

Zero-Shot Performance Prediction for Probabilistic Scaling Laws
by: Schram, Viktoria, et al.
Published: (2025)

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
by: Qiu, Zeju, et al.
Published: (2026)

Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training
by: Liu, Lei, et al.
Published: (2025)

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory
by: Kim, Juno, et al.
Published: (2026)

xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
by: Beck, Maximilian, et al.
Published: (2025)

Scaling Laws for Predicting Downstream Performance in LLMs
by: Chen, Yangyi, et al.
Published: (2024)

Associative Recurrent Memory Transformer
by: Rodkin, Ivan, et al.
Published: (2024)

GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory
by: Liu, Jiaxu, et al.
Published: (2025)

Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories
by: Alessandrelli, Andrea, et al.
Published: (2025)

Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)

Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization
by: Wan, Weilin, et al.
Published: (2026)

Scaling Laws are Redundancy Laws
by: Bi, Yuda, et al.
Published: (2025)

Unifying Learning Dynamics and Generalization in Transformers Scaling Law
by: Yang, Chiwun
Published: (2025)

Scaling Laws for Downstream Task Performance of Large Language Models
by: Isik, Berivan, et al.
Published: (2024)

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
by: Pandey, Vishal, et al.
Published: (2026)