Saved in:
| Main Authors: | Niu, Xueyan, Bai, Bo, Deng, Lei, Han, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.08707 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
by: Niu, Xueyan, et al.
Published: (2026)
by: Niu, Xueyan, et al.
Published: (2026)
High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models
by: Yilmaz, Selim F., et al.
Published: (2023)
by: Yilmaz, Selim F., et al.
Published: (2023)
Muon in Associative Memory Learning: Training Dynamics and Scaling Laws
by: Li, Binghui, et al.
Published: (2026)
by: Li, Binghui, et al.
Published: (2026)
Understanding Transformer from the Perspective of Associative Memory
by: Zhong, Shu, et al.
Published: (2025)
by: Zhong, Shu, et al.
Published: (2025)
Scaling Laws for Associative Memories
by: Cabannes, Vivien, et al.
Published: (2023)
by: Cabannes, Vivien, et al.
Published: (2023)
Understanding Factual Recall in Transformers via Associative Memories
by: Nichani, Eshaan, et al.
Published: (2024)
by: Nichani, Eshaan, et al.
Published: (2024)
Tensor Cache: Eviction-conditioned Associative Memory for Transformers
by: Swain, Kabir, et al.
Published: (2026)
by: Swain, Kabir, et al.
Published: (2026)
Configuration-to-Performance Scaling Law with Neural Ansatz
by: Zhang, Huaqing, et al.
Published: (2026)
by: Zhang, Huaqing, et al.
Published: (2026)
Scaling Laws for Data-Efficient Visual Transfer Learning
by: Yang, Wenxuan, et al.
Published: (2025)
by: Yang, Wenxuan, et al.
Published: (2025)
A Mean Field Ansatz for Zero-Shot Weight Transfer
by: Chen, Xingyuan, et al.
Published: (2024)
by: Chen, Xingyuan, et al.
Published: (2024)
Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding
by: Fei, Weizhi, et al.
Published: (2024)
by: Fei, Weizhi, et al.
Published: (2024)
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
by: Hägele, Alexander, et al.
Published: (2024)
by: Hägele, Alexander, et al.
Published: (2024)
Determinant Estimation under Memory Constraints and Neural Scaling Laws
by: Ameli, Siavash, et al.
Published: (2025)
by: Ameli, Siavash, et al.
Published: (2025)
On the Invariance and Generality of Neural Scaling Laws
by: Han, Xing, et al.
Published: (2026)
by: Han, Xing, et al.
Published: (2026)
Understanding and Mitigating the Bias in Sample Selection for Learning with Noisy Labels
by: Wei, Qi, et al.
Published: (2024)
by: Wei, Qi, et al.
Published: (2024)
Scaling Law for Time Series Forecasting
by: Shi, Jingzhe, et al.
Published: (2024)
by: Shi, Jingzhe, et al.
Published: (2024)
From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction
by: Yan, Bencheng, et al.
Published: (2025)
by: Yan, Bencheng, et al.
Published: (2025)
Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference
by: Han, Yunchu, et al.
Published: (2025)
by: Han, Yunchu, et al.
Published: (2025)
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
by: Havrilla, Alex, et al.
Published: (2024)
by: Havrilla, Alex, et al.
Published: (2024)
Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics
by: Kendiukhov, Ihor
Published: (2026)
by: Kendiukhov, Ihor
Published: (2026)
Generalization and Scaling Laws for Mixture-of-Experts Transformers
by: Mayaki, Mansour Zoubeirou a
Published: (2026)
by: Mayaki, Mansour Zoubeirou a
Published: (2026)
Empowering LLMs in Decision Games through Algorithmic Data Synthesis
by: Wang, Haolin, et al.
Published: (2025)
by: Wang, Haolin, et al.
Published: (2025)
From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme
by: Li, Xueyan, et al.
Published: (2025)
by: Li, Xueyan, et al.
Published: (2025)
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
by: Sardana, Nikhil, et al.
Published: (2023)
by: Sardana, Nikhil, et al.
Published: (2023)
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
by: Wu, Xiaojun, et al.
Published: (2025)
by: Wu, Xiaojun, et al.
Published: (2025)
Zero-Shot Performance Prediction for Probabilistic Scaling Laws
by: Schram, Viktoria, et al.
Published: (2025)
by: Schram, Viktoria, et al.
Published: (2025)
POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
by: Qiu, Zeju, et al.
Published: (2026)
by: Qiu, Zeju, et al.
Published: (2026)
Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training
by: Liu, Lei, et al.
Published: (2025)
by: Liu, Lei, et al.
Published: (2025)
Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory
by: Kim, Juno, et al.
Published: (2026)
by: Kim, Juno, et al.
Published: (2026)
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
by: Beck, Maximilian, et al.
Published: (2025)
by: Beck, Maximilian, et al.
Published: (2025)
Scaling Laws for Predicting Downstream Performance in LLMs
by: Chen, Yangyi, et al.
Published: (2024)
by: Chen, Yangyi, et al.
Published: (2024)
Associative Recurrent Memory Transformer
by: Rodkin, Ivan, et al.
Published: (2024)
by: Rodkin, Ivan, et al.
Published: (2024)
GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory
by: Liu, Jiaxu, et al.
Published: (2025)
by: Liu, Jiaxu, et al.
Published: (2025)
Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories
by: Alessandrelli, Andrea, et al.
Published: (2025)
by: Alessandrelli, Andrea, et al.
Published: (2025)
Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2024)
Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization
by: Wan, Weilin, et al.
Published: (2026)
by: Wan, Weilin, et al.
Published: (2026)
Scaling Laws are Redundancy Laws
by: Bi, Yuda, et al.
Published: (2025)
by: Bi, Yuda, et al.
Published: (2025)
Unifying Learning Dynamics and Generalization in Transformers Scaling Law
by: Yang, Chiwun
Published: (2025)
by: Yang, Chiwun
Published: (2025)
Scaling Laws for Downstream Task Performance of Large Language Models
by: Isik, Berivan, et al.
Published: (2024)
by: Isik, Berivan, et al.
Published: (2024)
Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
by: Pandey, Vishal, et al.
Published: (2026)
by: Pandey, Vishal, et al.
Published: (2026)
Similar Items
-
On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
by: Niu, Xueyan, et al.
Published: (2026) -
High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models
by: Yilmaz, Selim F., et al.
Published: (2023) -
Muon in Associative Memory Learning: Training Dynamics and Scaling Laws
by: Li, Binghui, et al.
Published: (2026) -
Understanding Transformer from the Perspective of Associative Memory
by: Zhong, Shu, et al.
Published: (2025) -
Scaling Laws for Associative Memories
by: Cabannes, Vivien, et al.
Published: (2023)