:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sadhukhan, Ranajoy, Cao, Sheng, Dong, Harry, Zhao, Changsheng, Purpura-Pontoniere, Attiano, Tian, Yuandong, Liu, Zechun, Chen, Beidi
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.10639
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Kinetics: Rethinking Test-Time Scaling Laws
by: Sadhukhan, Ranajoy, et al.
Published: (2025)

Memory Mosaics
by: Zhang, Jianyu, et al.
Published: (2024)

MagicPIG: LSH Sampling for Efficient LLM Generation
by: Chen, Zhuoming, et al.
Published: (2024)

Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023)

LoCoCo: Dropping In Convolutions for Long Context Compression
by: Cai, Ruisi, et al.
Published: (2024)

On the Surprising Effectiveness of Attention Transfer for Vision Transformers
by: Li, Alexander C., et al.
Published: (2024)

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
by: Zhao, Jiawei, et al.
Published: (2024)

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
by: Zhang, Zhenyu, et al.
Published: (2025)

Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
by: Dong, Harry, et al.
Published: (2024)

SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
by: Tian, Yuandong
Published: (2025)

Scalable LLM Reasoning Acceleration with Low-rank Distillation
by: Dong, Harry, et al.
Published: (2025)

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)

Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training
by: Luo, Cheng, et al.
Published: (2024)

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)

Spectral Journey: How Transformers Predict the Shortest Path
by: Cohen, Andrew, et al.
Published: (2025)

Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
by: Zheng, Haizhong, et al.
Published: (2025)

Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)

GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
by: Su, DiJia, et al.
Published: (2025)

Deep Think with Confidence
by: Fu, Yichao, et al.
Published: (2025)

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)

Learnable Community-Aware Transformer for Brain Connectome Analysis with Token Clustering
by: Yang, Yanting, et al.
Published: (2024)

Neural Computers
by: Zhuge, Mingchen, et al.
Published: (2026)

The Path Not Taken: RLVR Provably Learns Off the Principals
by: Zhu, Hanqing, et al.
Published: (2025)

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
by: Yang, Xinyu, et al.
Published: (2025)

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
by: Yang, Xinyu, et al.
Published: (2025)

Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
by: Rashidinejad, Paria, et al.
Published: (2024)

STEM: Unleashing the Power of Embeddings for Multi-task Recommendation
by: Su, Liangcai, et al.
Published: (2023)

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)

InRank: Incremental Low-Rank Learning
by: Zhao, Jiawei, et al.
Published: (2023)

Few-shot Neural Architecture Search
by: Zhao, Yiyang, et al.
Published: (2020)

WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points
by: Li, Dongyue, et al.
Published: (2026)

A Principled Loss Function for Direct Language Model Alignment
by: Tan, Yuandong
Published: (2025)

Quantized Reasoning Models Think They Need to Think Longer, but They Do Not
by: Lotfi, Sanae, et al.
Published: (2026)

TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting
by: Liu, Zhipeng, et al.
Published: (2025)

Golden Ratio Search: A Low-Power Adversarial Attack for Deep Learning based Modulation Classification
by: Sadhukhan, Deepsayan, et al.
Published: (2024)

Multi-objective Optimization by Learning Space Partitions
by: Zhao, Yiyang, et al.
Published: (2021)