Saved in:
| Main Authors: | Sadhukhan, Ranajoy, Cao, Sheng, Dong, Harry, Zhao, Changsheng, Purpura-Pontoniere, Attiano, Tian, Yuandong, Liu, Zechun, Chen, Beidi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.10639 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Kinetics: Rethinking Test-Time Scaling Laws
by: Sadhukhan, Ranajoy, et al.
Published: (2025)
by: Sadhukhan, Ranajoy, et al.
Published: (2025)
Memory Mosaics
by: Zhang, Jianyu, et al.
Published: (2024)
by: Zhang, Jianyu, et al.
Published: (2024)
MagicPIG: LSH Sampling for Efficient LLM Generation
by: Chen, Zhuoming, et al.
Published: (2024)
by: Chen, Zhuoming, et al.
Published: (2024)
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)
by: Cao, Sheng, et al.
Published: (2025)
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023)
by: Tian, Yuandong, et al.
Published: (2023)
LoCoCo: Dropping In Convolutions for Long Context Compression
by: Cai, Ruisi, et al.
Published: (2024)
by: Cai, Ruisi, et al.
Published: (2024)
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
by: Li, Alexander C., et al.
Published: (2024)
by: Li, Alexander C., et al.
Published: (2024)
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
by: Zhang, Zhenyu, et al.
Published: (2025)
by: Zhang, Zhenyu, et al.
Published: (2025)
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
by: Tian, Yuandong
Published: (2025)
by: Tian, Yuandong
Published: (2025)
Scalable LLM Reasoning Acceleration with Low-rank Distillation
by: Dong, Harry, et al.
Published: (2025)
by: Dong, Harry, et al.
Published: (2025)
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
by: Liu, Zechun, et al.
Published: (2025)
by: Liu, Zechun, et al.
Published: (2025)
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training
by: Luo, Cheng, et al.
Published: (2024)
by: Luo, Cheng, et al.
Published: (2024)
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
Spectral Journey: How Transformers Predict the Shortest Path
by: Cohen, Andrew, et al.
Published: (2025)
by: Cohen, Andrew, et al.
Published: (2025)
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
by: Zheng, Haizhong, et al.
Published: (2025)
by: Zheng, Haizhong, et al.
Published: (2025)
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
by: Tian, Yuandong
Published: (2024)
by: Tian, Yuandong
Published: (2024)
GaLore 2: Large-Scale LLM Pre-Training by Gradient Low-Rank Projection
by: Su, DiJia, et al.
Published: (2025)
by: Su, DiJia, et al.
Published: (2025)
Deep Think with Confidence
by: Fu, Yichao, et al.
Published: (2025)
by: Fu, Yichao, et al.
Published: (2025)
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)
by: Deng, Wenlong, et al.
Published: (2024)
Learnable Community-Aware Transformer for Brain Connectome Analysis with Token Clustering
by: Yang, Yanting, et al.
Published: (2024)
by: Yang, Yanting, et al.
Published: (2024)
Neural Computers
by: Zhuge, Mingchen, et al.
Published: (2026)
by: Zhuge, Mingchen, et al.
Published: (2026)
The Path Not Taken: RLVR Provably Learns Off the Principals
by: Zhu, Hanqing, et al.
Published: (2025)
by: Zhu, Hanqing, et al.
Published: (2025)
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
by: Yang, Xinyu, et al.
Published: (2025)
by: Yang, Xinyu, et al.
Published: (2025)
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
by: Yang, Xinyu, et al.
Published: (2025)
by: Yang, Xinyu, et al.
Published: (2025)
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
by: Rashidinejad, Paria, et al.
Published: (2024)
by: Rashidinejad, Paria, et al.
Published: (2024)
STEM: Unleashing the Power of Embeddings for Multi-task Recommendation
by: Su, Liangcai, et al.
Published: (2023)
by: Su, Liangcai, et al.
Published: (2023)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
InRank: Incremental Low-Rank Learning
by: Zhao, Jiawei, et al.
Published: (2023)
by: Zhao, Jiawei, et al.
Published: (2023)
Few-shot Neural Architecture Search
by: Zhao, Yiyang, et al.
Published: (2020)
by: Zhao, Yiyang, et al.
Published: (2020)
WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points
by: Li, Dongyue, et al.
Published: (2026)
by: Li, Dongyue, et al.
Published: (2026)
A Principled Loss Function for Direct Language Model Alignment
by: Tan, Yuandong
Published: (2025)
by: Tan, Yuandong
Published: (2025)
Quantized Reasoning Models Think They Need to Think Longer, but They Do Not
by: Lotfi, Sanae, et al.
Published: (2026)
by: Lotfi, Sanae, et al.
Published: (2026)
TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting
by: Liu, Zhipeng, et al.
Published: (2025)
by: Liu, Zhipeng, et al.
Published: (2025)
Golden Ratio Search: A Low-Power Adversarial Attack for Deep Learning based Modulation Classification
by: Sadhukhan, Deepsayan, et al.
Published: (2024)
by: Sadhukhan, Deepsayan, et al.
Published: (2024)
Multi-objective Optimization by Learning Space Partitions
by: Zhao, Yiyang, et al.
Published: (2021)
by: Zhao, Yiyang, et al.
Published: (2021)
Similar Items
-
Kinetics: Rethinking Test-Time Scaling Laws
by: Sadhukhan, Ranajoy, et al.
Published: (2025) -
Memory Mosaics
by: Zhang, Jianyu, et al.
Published: (2024) -
MagicPIG: LSH Sampling for Efficient LLM Generation
by: Chen, Zhuoming, et al.
Published: (2024) -
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025) -
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023)