Saved in:
| Main Authors: | Tang, Chaoqing, Zhuang, Huanze, Tian, Guiyun, Zeng, Zhenli, Ding, Yi, Liu, Wenzhong, Bai, Xiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.11592 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training
by: Deng, Keqi, et al.
Published: (2026)
by: Deng, Keqi, et al.
Published: (2026)
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders
by: Yuan, Kun, et al.
Published: (2026)
by: Yuan, Kun, et al.
Published: (2026)
Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization
by: Tian, Jiayi, et al.
Published: (2025)
by: Tian, Jiayi, et al.
Published: (2025)
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
by: Shen, Guobin, et al.
Published: (2024)
by: Shen, Guobin, et al.
Published: (2024)
KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs
by: Tang, Yixuan, et al.
Published: (2026)
by: Tang, Yixuan, et al.
Published: (2026)
Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models
by: Solgi, Ryan, et al.
Published: (2025)
by: Solgi, Ryan, et al.
Published: (2025)
One-for-All Pruning: A Universal Model for Customized Compression of Large Language Models
by: Ye, Rongguang, et al.
Published: (2025)
by: Ye, Rongguang, et al.
Published: (2025)
One Small and One Large for Document-level Event Argument Extraction
by: Peng, Jiaren, et al.
Published: (2024)
by: Peng, Jiaren, et al.
Published: (2024)
Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models
by: Upasani, Shubhangi, et al.
Published: (2026)
by: Upasani, Shubhangi, et al.
Published: (2026)
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering
by: Joshi, Vinay, et al.
Published: (2025)
by: Joshi, Vinay, et al.
Published: (2025)
KVReviver: Reversible KV Cache Compression with Sketch-Based Token Reconstruction
by: Yuan, Aomufei, et al.
Published: (2025)
by: Yuan, Aomufei, et al.
Published: (2025)
LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
by: Xu, Zhichao, et al.
Published: (2026)
by: Xu, Zhichao, et al.
Published: (2026)
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
by: Hu, Shengding, et al.
Published: (2024)
by: Hu, Shengding, et al.
Published: (2024)
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
by: Feng, Kehua, et al.
Published: (2024)
by: Feng, Kehua, et al.
Published: (2024)
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
by: Yu, Yaodong, et al.
Published: (2023)
by: Yu, Yaodong, et al.
Published: (2023)
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection
by: Hu, Zhengpei, et al.
Published: (2026)
by: Hu, Zhengpei, et al.
Published: (2026)
Efficient Long CoT Reasoning in Small Language Models
by: Wang, Zhaoyang, et al.
Published: (2025)
by: Wang, Zhaoyang, et al.
Published: (2025)
CSPLADE: Learned Sparse Retrieval with Causal Language Models
by: Xu, Zhichao, et al.
Published: (2025)
by: Xu, Zhichao, et al.
Published: (2025)
RATE: Reviewer Profiling and Annotation-free Training for Expertise Ranking in Peer Review Systems
by: Liu, Weicong, et al.
Published: (2026)
by: Liu, Weicong, et al.
Published: (2026)
Compressing Lengthy Context With UltraGist
by: Zhang, Peitian, et al.
Published: (2024)
by: Zhang, Peitian, et al.
Published: (2024)
GCoder: Improving Large Language Model for Generalized Graph Problem Solving
by: Zhang, Qifan, et al.
Published: (2024)
by: Zhang, Qifan, et al.
Published: (2024)
Training-free Context-adaptive Attention for Efficient Long Context Modeling
by: You, Zeng, et al.
Published: (2025)
by: You, Zeng, et al.
Published: (2025)
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
by: Xie, Xudong, et al.
Published: (2024)
by: Xie, Xudong, et al.
Published: (2024)
Achieving Sparse Activation in Small Language Models
by: Song, Jifeng, et al.
Published: (2024)
by: Song, Jifeng, et al.
Published: (2024)
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
by: Ding, Ning, et al.
Published: (2024)
by: Ding, Ning, et al.
Published: (2024)
Improving Large Models with Small models: Lower Costs and Better Performance
by: Chen, Dong, et al.
Published: (2024)
by: Chen, Dong, et al.
Published: (2024)
A Unified Sparse Attention via Multi-Granularity Compression
by: Liu, Siran, et al.
Published: (2025)
by: Liu, Siran, et al.
Published: (2025)
Compressed Sensing for Capability Localization in Large Language Models
by: Bair, Anna, et al.
Published: (2026)
by: Bair, Anna, et al.
Published: (2026)
Sparse Rewards Can Self-Train Dialogue Agents
by: Lattimer, Barrett Martin, et al.
Published: (2024)
by: Lattimer, Barrett Martin, et al.
Published: (2024)
From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors
by: Zhou, Yitian, et al.
Published: (2026)
by: Zhou, Yitian, et al.
Published: (2026)
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
by: Bai, Yushi, et al.
Published: (2026)
by: Bai, Yushi, et al.
Published: (2026)
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
by: Wen, Haoming, et al.
Published: (2025)
by: Wen, Haoming, et al.
Published: (2025)
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
by: Wang, Yudong, et al.
Published: (2025)
by: Wang, Yudong, et al.
Published: (2025)
Data-free Weight Compress and Denoise for Large Language Models
by: Peng, Runyu, et al.
Published: (2024)
by: Peng, Runyu, et al.
Published: (2024)
Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models
by: Zeng, Hang, et al.
Published: (2026)
by: Zeng, Hang, et al.
Published: (2026)
Wave Network: An Ultra-Small Language Model
by: Zhang, Xin, et al.
Published: (2024)
by: Zhang, Xin, et al.
Published: (2024)
PocketLLM: Ultimate Compression of Large Language Models via Meta Networks
by: Tian, Ye, et al.
Published: (2025)
by: Tian, Ye, et al.
Published: (2025)
Training Superior Sparse Autoencoders for Instruct Models
by: Li, Jiaming, et al.
Published: (2025)
by: Li, Jiaming, et al.
Published: (2025)
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)
by: Pan, Bowen, et al.
Published: (2024)
Similar Items
-
UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training
by: Deng, Keqi, et al.
Published: (2026) -
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
by: Liang, Xun, et al.
Published: (2025) -
HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders
by: Yuan, Kun, et al.
Published: (2026) -
Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization
by: Tian, Jiayi, et al.
Published: (2025) -
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
by: Shen, Guobin, et al.
Published: (2024)