:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Kumar, Ankur
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2410.15704
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An experimental study of KV cache reuse strategies in chunk-level caching systems
by: Cestola, Samuel, et al.
Published: (2026)

QET: Enhancing Quantized LLM Parameters and KV cache Compression through Element Substitution and Residual Clustering
by: Wang, Yanshu, et al.
Published: (2024)

SALS: Sparse Attention in Latent Space for KV cache Compression
by: Mu, Junlin, et al.
Published: (2025)

Layer-wise dynamic rank for compressing large language models
by: Mi, Zhendong, et al.
Published: (2025)

NIRVANA: Structured pruning reimagined for large language models compression
by: Ai, Mengting, et al.
Published: (2025)

BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference
by: Gulhan, Ahmed Burak, et al.
Published: (2025)

IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
by: Mao, Yuzhen, et al.
Published: (2026)

KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
by: Ye, Hancheng, et al.
Published: (2025)

A general tensor-structured compression scheme for efficient large language models
by: Lu, Ying, et al.
Published: (2026)

Query-efficient model evaluation using cached responses
by: Helm, Hayden, et al.
Published: (2026)

Residual-Mass Accounting for Partial-KV Decoding
by: Hoshi, Yasuto, et al.
Published: (2026)

Individualized non-uniform quantization for vector search
by: Tepper, Mariano, et al.
Published: (2025)

Visual cognition in multimodal large language models
by: Buschoff, Luca M. Schulze, et al.
Published: (2023)

Hypothesis generation and updating in large language models
by: Xiong, Hua-Dong
Published: (2026)

Representation in large language models
by: Yetman, Cameron
Published: (2025)

Price of universality in vector quantization is at most 0.11 bit
by: Harbuzova, Alina, et al.
Published: (2026)

Amortizing intractable inference in large language models
by: Hu, Edward J., et al.
Published: (2023)

Alignment faking in large language models
by: Greenblatt, Ryan, et al.
Published: (2024)

AI-AI Bias: large language models favor communications generated by large language models
by: Laurito, Walter, et al.
Published: (2024)

Training microrobots to swim by a large language model
by: Xu, Zhuoqun, et al.
Published: (2024)

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
by: Gupta, Manan, et al.
Published: (2026)

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
by: Boss, Mark, et al.
Published: (2026)

Quantifying construct validity in large language model evaluations
by: Kearns, Ryan Othniel
Published: (2026)

The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
by: Qasim, Kaleem Ullah, et al.
Published: (2026)

Long-form factuality in large language models
by: Wei, Jerry, et al.
Published: (2024)

Can large language models explore in-context?
by: Krishnamurthy, Akshay, et al.
Published: (2024)

Quantifying perturbation impacts for large language models
by: Rauba, Paulius, et al.
Published: (2024)

Alignment of large language models with constrained learning
by: Zhang, Botong, et al.
Published: (2025)

Are large language models superhuman chemists?
by: Mirza, Adrian, et al.
Published: (2024)

Variational quantization for state space models
by: David, Etienne, et al.
Published: (2024)

Less can be more for predicting properties with large language models
by: Alampara, Nawaf, et al.
Published: (2024)

Harnessing large-language models to generate private synthetic text
by: Kurakin, Alexey, et al.
Published: (2023)

Prompt reinforcing for long-term planning of large language models
by: Lin, Hsien-Chin, et al.
Published: (2025)

Relational reasoning and inductive bias in transformers and large language models
by: Geerts, Jesse, et al.
Published: (2025)

Insights into a radiology-specialised multimodal large language model with sparse autoencoders
by: Bouzid, Kenza, et al.
Published: (2025)

ARETE: an R package for Automated REtrieval from TExt with large language models
by: Branco, Vasco V., et al.
Published: (2025)

Sentiment trading with large language models
by: Kirtac, Kemal, et al.
Published: (2024)

Uniform error bounds for quantized dynamical models
by: Metakalard, Abdelkader, et al.
Published: (2026)

Linguistic properties and model scale in brain encoding: from small to compressed language models
by: Oota, Subba Reddy, et al.
Published: (2026)

Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks
by: Mei, Taiyuan, et al.
Published: (2024)