Saved in:
| Main Author: | Kumar, Ankur |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.15704 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An experimental study of KV cache reuse strategies in chunk-level caching systems
by: Cestola, Samuel, et al.
Published: (2026)
by: Cestola, Samuel, et al.
Published: (2026)
QET: Enhancing Quantized LLM Parameters and KV cache Compression through Element Substitution and Residual Clustering
by: Wang, Yanshu, et al.
Published: (2024)
by: Wang, Yanshu, et al.
Published: (2024)
SALS: Sparse Attention in Latent Space for KV cache Compression
by: Mu, Junlin, et al.
Published: (2025)
by: Mu, Junlin, et al.
Published: (2025)
Layer-wise dynamic rank for compressing large language models
by: Mi, Zhendong, et al.
Published: (2025)
by: Mi, Zhendong, et al.
Published: (2025)
NIRVANA: Structured pruning reimagined for large language models compression
by: Ai, Mengting, et al.
Published: (2025)
by: Ai, Mengting, et al.
Published: (2025)
BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference
by: Gulhan, Ahmed Burak, et al.
Published: (2025)
by: Gulhan, Ahmed Burak, et al.
Published: (2025)
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
by: Mao, Yuzhen, et al.
Published: (2026)
by: Mao, Yuzhen, et al.
Published: (2026)
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
by: Ye, Hancheng, et al.
Published: (2025)
by: Ye, Hancheng, et al.
Published: (2025)
A general tensor-structured compression scheme for efficient large language models
by: Lu, Ying, et al.
Published: (2026)
by: Lu, Ying, et al.
Published: (2026)
Query-efficient model evaluation using cached responses
by: Helm, Hayden, et al.
Published: (2026)
by: Helm, Hayden, et al.
Published: (2026)
Residual-Mass Accounting for Partial-KV Decoding
by: Hoshi, Yasuto, et al.
Published: (2026)
by: Hoshi, Yasuto, et al.
Published: (2026)
Individualized non-uniform quantization for vector search
by: Tepper, Mariano, et al.
Published: (2025)
by: Tepper, Mariano, et al.
Published: (2025)
Visual cognition in multimodal large language models
by: Buschoff, Luca M. Schulze, et al.
Published: (2023)
by: Buschoff, Luca M. Schulze, et al.
Published: (2023)
Hypothesis generation and updating in large language models
by: Xiong, Hua-Dong
Published: (2026)
by: Xiong, Hua-Dong
Published: (2026)
Representation in large language models
by: Yetman, Cameron
Published: (2025)
by: Yetman, Cameron
Published: (2025)
Price of universality in vector quantization is at most 0.11 bit
by: Harbuzova, Alina, et al.
Published: (2026)
by: Harbuzova, Alina, et al.
Published: (2026)
Amortizing intractable inference in large language models
by: Hu, Edward J., et al.
Published: (2023)
by: Hu, Edward J., et al.
Published: (2023)
Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024)
by: Greenblatt, Ryan, et al.
Published: (2024)
AI-AI Bias: large language models favor communications generated by large language models
by: Laurito, Walter, et al.
Published: (2024)
by: Laurito, Walter, et al.
Published: (2024)
Training microrobots to swim by a large language model
by: Xu, Zhuoqun, et al.
Published: (2024)
by: Xu, Zhuoqun, et al.
Published: (2024)
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
by: Gupta, Manan, et al.
Published: (2026)
by: Gupta, Manan, et al.
Published: (2026)
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
by: Boss, Mark, et al.
Published: (2026)
by: Boss, Mark, et al.
Published: (2026)
Quantifying construct validity in large language model evaluations
by: Kearns, Ryan Othniel
Published: (2026)
by: Kearns, Ryan Othniel
Published: (2026)
The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
by: Qasim, Kaleem Ullah, et al.
Published: (2026)
by: Qasim, Kaleem Ullah, et al.
Published: (2026)
Long-form factuality in large language models
by: Wei, Jerry, et al.
Published: (2024)
by: Wei, Jerry, et al.
Published: (2024)
Can large language models explore in-context?
by: Krishnamurthy, Akshay, et al.
Published: (2024)
by: Krishnamurthy, Akshay, et al.
Published: (2024)
Quantifying perturbation impacts for large language models
by: Rauba, Paulius, et al.
Published: (2024)
by: Rauba, Paulius, et al.
Published: (2024)
Alignment of large language models with constrained learning
by: Zhang, Botong, et al.
Published: (2025)
by: Zhang, Botong, et al.
Published: (2025)
Are large language models superhuman chemists?
by: Mirza, Adrian, et al.
Published: (2024)
by: Mirza, Adrian, et al.
Published: (2024)
Variational quantization for state space models
by: David, Etienne, et al.
Published: (2024)
by: David, Etienne, et al.
Published: (2024)
Less can be more for predicting properties with large language models
by: Alampara, Nawaf, et al.
Published: (2024)
by: Alampara, Nawaf, et al.
Published: (2024)
Harnessing large-language models to generate private synthetic text
by: Kurakin, Alexey, et al.
Published: (2023)
by: Kurakin, Alexey, et al.
Published: (2023)
Prompt reinforcing for long-term planning of large language models
by: Lin, Hsien-Chin, et al.
Published: (2025)
by: Lin, Hsien-Chin, et al.
Published: (2025)
Relational reasoning and inductive bias in transformers and large language models
by: Geerts, Jesse, et al.
Published: (2025)
by: Geerts, Jesse, et al.
Published: (2025)
Insights into a radiology-specialised multimodal large language model with sparse autoencoders
by: Bouzid, Kenza, et al.
Published: (2025)
by: Bouzid, Kenza, et al.
Published: (2025)
ARETE: an R package for Automated REtrieval from TExt with large language models
by: Branco, Vasco V., et al.
Published: (2025)
by: Branco, Vasco V., et al.
Published: (2025)
Sentiment trading with large language models
by: Kirtac, Kemal, et al.
Published: (2024)
by: Kirtac, Kemal, et al.
Published: (2024)
Uniform error bounds for quantized dynamical models
by: Metakalard, Abdelkader, et al.
Published: (2026)
by: Metakalard, Abdelkader, et al.
Published: (2026)
Linguistic properties and model scale in brain encoding: from small to compressed language models
by: Oota, Subba Reddy, et al.
Published: (2026)
by: Oota, Subba Reddy, et al.
Published: (2026)
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks
by: Mei, Taiyuan, et al.
Published: (2024)
by: Mei, Taiyuan, et al.
Published: (2024)
Similar Items
-
An experimental study of KV cache reuse strategies in chunk-level caching systems
by: Cestola, Samuel, et al.
Published: (2026) -
QET: Enhancing Quantized LLM Parameters and KV cache Compression through Element Substitution and Residual Clustering
by: Wang, Yanshu, et al.
Published: (2024) -
SALS: Sparse Attention in Latent Space for KV cache Compression
by: Mu, Junlin, et al.
Published: (2025) -
Layer-wise dynamic rank for compressing large language models
by: Mi, Zhendong, et al.
Published: (2025) -
NIRVANA: Structured pruning reimagined for large language models compression
by: Ai, Mengting, et al.
Published: (2025)