:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Ruizhe, Zhang, Kexuan, Fang, Yihao, Yu, Baifeng
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2512.23862
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
by: Munkhdalai, Tsendsuren, et al.
Published: (2024)

Causal Abstraction in Model Interpretability: A Compact Survey
by: Zhang, Yihao
Published: (2024)

DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
by: Zhang, Yuxuan, et al.
Published: (2024)

Generating Pretraining Tokens from Organic Data for Data-Bound Scaling
by: Yu, Zichun, et al.
Published: (2026)

To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining
by: Singh, Karan, et al.
Published: (2026)

SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
by: S, Santhosh G, et al.
Published: (2025)

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification
by: Zheng, Bo, et al.
Published: (2026)

Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
by: Mamtani, Sumit, et al.
Published: (2025)

Small Vocabularies, Big Gains: Pretraining and Tokenization in Time Series Models
by: Roger, Alexis, et al.
Published: (2025)

Limitations of Normalization in Attention Mechanism
by: Mudarisov, Timur, et al.
Published: (2025)

Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
by: Li, Ruizhe, et al.
Published: (2024)

MathPile: A Billion-Token-Scale Pretraining Corpus for Math
by: Wang, Zengzhi, et al.
Published: (2023)

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
by: Tan, Shawn, et al.
Published: (2024)

HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
by: Yang, Dongquan, et al.
Published: (2025)

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks
by: Zheng, Yicong, et al.
Published: (2025)

Scaling Reasoning without Attention
by: Zhao, Xueliang, et al.
Published: (2025)

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
by: Li, Ruizhe, et al.
Published: (2025)

Clustering-driven Memory Compression for On-device Large Language Models
by: Bohdal, Ondrej, et al.
Published: (2026)

Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level
by: Feng, Zhaopeng, et al.
Published: (2024)

Time and Memory Trade-off of KV-Cache Compression in Tensor Transformer Decoding
by: Chen, Yifang, et al.
Published: (2025)

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)

MemRerank: Preference Memory for Personalized Product Reranking
by: Peng, Zhiyuan, et al.
Published: (2026)

Training-free Ultra Small Model for Universal Sparse Reconstruction in Compressed Sensing
by: Tang, Chaoqing, et al.
Published: (2025)

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models
by: Zhang, Hanling, et al.
Published: (2025)

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
by: Chen, Zhixun, et al.
Published: (2025)

SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing
by: Xu, Yifei, et al.
Published: (2026)

Small Language Models for Application Interactions: A Case Study
by: Li, Beibin, et al.
Published: (2024)

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)

Mixture of Chapters: Scaling Learnt Memory in Transformers
by: Tibrewal, Tasmay Pankaj, et al.
Published: (2026)

Pre-training Limited Memory Language Models with Internal and External Knowledge
by: Zhao, Linxi, et al.
Published: (2025)

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models
by: Ok, Hyunjong, et al.
Published: (2026)

Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
by: Zhang, Jun, et al.
Published: (2025)

FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression
by: Mittu, Fazal, et al.
Published: (2024)

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
by: Lin, Bill Yuchen, et al.
Published: (2025)

LLM in a flash: Efficient Large Language Model Inference with Limited Memory
by: Alizadeh, Keivan, et al.
Published: (2023)

TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
by: Lin, Weizhe, et al.
Published: (2025)

A Framework for Inference Inspired by Human Memory Mechanisms
by: Zeng, Xiangyu, et al.
Published: (2023)

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
by: Xiaomi, LLM-Core, et al.
Published: (2025)

Secure LLM Fine-Tuning via Safety-Aware Probing
by: Wu, Chengcan, et al.
Published: (2025)