Saved in:
| Main Authors: | Wang, Yumeng, Xiao, Zhenyang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.09486 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Llamazip: Leveraging LLaMA for Lossless Text Compression and Training Dataset Detection
by: Dréano, Sören, et al.
Published: (2025)
by: Dréano, Sören, et al.
Published: (2025)
Lossless Token Sequence Compression via Meta-Tokens
by: Harvill, John, et al.
Published: (2025)
by: Harvill, John, et al.
Published: (2025)
Trellis: Learning to Compress Key-Value Memory in Attention Models
by: Karami, Mahdi, et al.
Published: (2025)
by: Karami, Mahdi, et al.
Published: (2025)
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024)
by: Zhu, Qianchao, et al.
Published: (2024)
Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction
by: Mao, Yu, et al.
Published: (2025)
by: Mao, Yu, et al.
Published: (2025)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
LoCoCo: Dropping In Convolutions for Long Context Compression
by: Cai, Ruisi, et al.
Published: (2024)
by: Cai, Ruisi, et al.
Published: (2024)
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression
by: Mittu, Fazal, et al.
Published: (2024)
by: Mittu, Fazal, et al.
Published: (2024)
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
by: Kang, Hao, et al.
Published: (2024)
by: Kang, Hao, et al.
Published: (2024)
Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation
by: Gui, Lujun, et al.
Published: (2024)
by: Gui, Lujun, et al.
Published: (2024)
RAM-Net: Expressive Linear Attention with Selectively Addressable Memory
by: Xiao, Kaicheng, et al.
Published: (2026)
by: Xiao, Kaicheng, et al.
Published: (2026)
PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database
by: Sun, Hui, et al.
Published: (2025)
by: Sun, Hui, et al.
Published: (2025)
Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining
by: Huang, Ruizhe, et al.
Published: (2025)
by: Huang, Ruizhe, et al.
Published: (2025)
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
by: Tian, Yuandong, et al.
Published: (2023)
by: Tian, Yuandong, et al.
Published: (2023)
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
by: Liu, Fangcheng, et al.
Published: (2024)
by: Liu, Fangcheng, et al.
Published: (2024)
LoLA: Low-Rank Linear Attention With Sparse Caching
by: McDermott, Luke, et al.
Published: (2025)
by: McDermott, Luke, et al.
Published: (2025)
S$^3$-Attention:Attention-Aligned Endogenous Retrieval for Memory-Bounded Long-Context Inference
by: Ma, Qingsen, et al.
Published: (2026)
by: Ma, Qingsen, et al.
Published: (2026)
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
by: Yang, Qingyue, et al.
Published: (2025)
by: Yang, Qingyue, et al.
Published: (2025)
SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
by: S, Santhosh G, et al.
Published: (2025)
by: S, Santhosh G, et al.
Published: (2025)
Compressed Context Memory For Online Language Model Interaction
by: Kim, Jang-Hyun, et al.
Published: (2023)
by: Kim, Jang-Hyun, et al.
Published: (2023)
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
by: Tang, Hanlin, et al.
Published: (2024)
by: Tang, Hanlin, et al.
Published: (2024)
LogLLaMA: Transformer-based log anomaly detection with LLaMA
by: Yang, Zhuoyi, et al.
Published: (2025)
by: Yang, Zhuoyi, et al.
Published: (2025)
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
by: Dialameh, Maryam, et al.
Published: (2025)
by: Dialameh, Maryam, et al.
Published: (2025)
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
by: Lin, Feng, et al.
Published: (2024)
by: Lin, Feng, et al.
Published: (2024)
CompAct: Compressed Activations for Memory-Efficient LLM Training
by: Shamshoum, Yara, et al.
Published: (2024)
by: Shamshoum, Yara, et al.
Published: (2024)
HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
by: Yang, Dongquan, et al.
Published: (2025)
by: Yang, Dongquan, et al.
Published: (2025)
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
by: Zhang, Rongzhi, et al.
Published: (2024)
by: Zhang, Rongzhi, et al.
Published: (2024)
Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning
by: Gupta, Prakhar, et al.
Published: (2026)
by: Gupta, Prakhar, et al.
Published: (2026)
LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models
by: Xi, Haocheng, et al.
Published: (2026)
by: Xi, Haocheng, et al.
Published: (2026)
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
Divide & Bind Your Attention for Improved Generative Semantic Nursing
by: Li, Yumeng, et al.
Published: (2023)
by: Li, Yumeng, et al.
Published: (2023)
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)
by: Saxena, Utkarsh, et al.
Published: (2024)
Lossless Vocabulary Reduction for Auto-Regressive Language Models
by: Chijiwa, Daiki, et al.
Published: (2025)
by: Chijiwa, Daiki, et al.
Published: (2025)
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
by: Ou, Jie, et al.
Published: (2024)
by: Ou, Jie, et al.
Published: (2024)
HyperAdaLoRA: Accelerating LoRA Rank Allocation During Training via Hypernetworks without Sacrificing Performance
by: Zhang, Hao, et al.
Published: (2025)
by: Zhang, Hao, et al.
Published: (2025)
TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs
by: Gu, Yuxuan, et al.
Published: (2025)
by: Gu, Yuxuan, et al.
Published: (2025)
Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher
by: Uzunoglu, Arda, et al.
Published: (2026)
by: Uzunoglu, Arda, et al.
Published: (2026)
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
by: Yang, Penghui, et al.
Published: (2025)
by: Yang, Penghui, et al.
Published: (2025)
Proximity to Losslessly Compressible Parameters
by: Farrugia-Roberts, Matthew
Published: (2023)
by: Farrugia-Roberts, Matthew
Published: (2023)
2-Tier SimCSE: Elevating BERT for Robust Sentence Embeddings
by: Wang, Yumeng, et al.
Published: (2025)
by: Wang, Yumeng, et al.
Published: (2025)
Similar Items
-
Llamazip: Leveraging LLaMA for Lossless Text Compression and Training Dataset Detection
by: Dréano, Sören, et al.
Published: (2025) -
Lossless Token Sequence Compression via Meta-Tokens
by: Harvill, John, et al.
Published: (2025) -
Trellis: Learning to Compress Key-Value Memory in Attention Models
by: Karami, Mahdi, et al.
Published: (2025) -
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024) -
Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction
by: Mao, Yu, et al.
Published: (2025)