:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tang, Chaoqing, Zhuang, Huanze, Tian, Guiyun, Zeng, Zhenli, Ding, Yi, Liu, Wenzhong, Bai, Xiang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2501.11592
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training
by: Deng, Keqi, et al.
Published: (2026)

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
by: Liang, Xun, et al.
Published: (2025)

HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders
by: Yuan, Kun, et al.
Published: (2026)

Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization
by: Tian, Jiayi, et al.
Published: (2025)

Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
by: Shen, Guobin, et al.
Published: (2024)

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs
by: Tang, Yixuan, et al.
Published: (2026)

Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models
by: Solgi, Ryan, et al.
Published: (2025)

One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models
by: Ye, Rongguang, et al.
Published: (2025)

One Small and One Large for Document-level Event Argument Extraction
by: Peng, Jiaren, et al.
Published: (2024)

Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models
by: Upasani, Shubhangi, et al.
Published: (2026)

TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering
by: Joshi, Vinay, et al.
Published: (2025)

KVReviver: Reversible KV Cache Compression with Sketch-Based Token Reconstruction
by: Yuan, Aomufei, et al.
Published: (2025)

LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
by: Xu, Zhichao, et al.
Published: (2026)

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
by: Hu, Shengding, et al.
Published: (2024)

SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
by: Feng, Kehua, et al.
Published: (2024)

White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
by: Yu, Yaodong, et al.
Published: (2023)

BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection
by: Hu, Zhengpei, et al.
Published: (2026)

Efficient Long CoT Reasoning in Small Language Models
by: Wang, Zhaoyang, et al.
Published: (2025)

CSPLADE: Learned Sparse Retrieval with Causal Language Models
by: Xu, Zhichao, et al.
Published: (2025)

RATE: Reviewer Profiling and Annotation-free Training for Expertise Ranking in Peer Review Systems
by: Liu, Weicong, et al.
Published: (2026)

Compressing Lengthy Context With UltraGist
by: Zhang, Peitian, et al.
Published: (2024)

GCoder: Improving Large Language Model for Generalized Graph Problem Solving
by: Zhang, Qifan, et al.
Published: (2024)

Training-free Context-adaptive Attention for Efficient Long Context Modeling
by: You, Zeng, et al.
Published: (2025)

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
by: Xie, Xudong, et al.
Published: (2024)

Achieving Sparse Activation in Small Language Models
by: Song, Jifeng, et al.
Published: (2024)

MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
by: Ding, Ning, et al.
Published: (2024)

Improving Large Models with Small models: Lower Costs and Better Performance
by: Chen, Dong, et al.
Published: (2024)

A Unified Sparse Attention via Multi-Granularity Compression
by: Liu, Siran, et al.
Published: (2025)

Compressed Sensing for Capability Localization in Large Language Models
by: Bair, Anna, et al.
Published: (2026)

Sparse Rewards Can Self-Train Dialogue Agents
by: Lattimer, Barrett Martin, et al.
Published: (2024)

From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors
by: Zhou, Yitian, et al.
Published: (2026)

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
by: Bai, Yushi, et al.
Published: (2026)

SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
by: Wen, Haoming, et al.
Published: (2025)

Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
by: Wang, Yudong, et al.
Published: (2025)

Data-free Weight Compress and Denoise for Large Language Models
by: Peng, Runyu, et al.
Published: (2024)

Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models
by: Zeng, Hang, et al.
Published: (2026)

Wave Network: An Ultra-Small Language Model
by: Zhang, Xin, et al.
Published: (2024)

PocketLLM: Ultimate Compression of Large Language Models via Meta Networks
by: Tian, Ye, et al.
Published: (2025)

Training Superior Sparse Autoencoders for Instruct Models
by: Li, Jiaming, et al.
Published: (2025)

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)