:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xia, Haojun, Wu, Xiaoxia, Li, Jisen, Wu, Robert, Wang, Junxiong, Wang, Jue, Li, Chenxi, Singhal, Aman, Shah, Alay Dilipbhai, Ariyak, Alpay, Zhuang, Donglin, Zhou, Zhongzhu, Athiwaratkun, Ben, Zheng, Zhen, Song, Shuaiwen Leon
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.18643
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
by: Zhou, Zhongzhu, et al.
Published: (2026)

SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving
by: Jia, Jinda, et al.
Published: (2026)

CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
by: Zhou, Zhongzhu, et al.
Published: (2026)

Imitate Optimal Policy: Prevail and Induce Action Collapse in Policy Gradient
by: Zhou, Zhongzhu, et al.
Published: (2025)

Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time
by: Zhang, Zhenyu, et al.
Published: (2025)

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training
by: Shao, Zelei, et al.
Published: (2025)

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
by: Wang, Junxiong, et al.
Published: (2026)

Introspective Diffusion Language Models
by: Yu, Yifan, et al.
Published: (2026)

Accurate KV Cache Quantization with Outlier Tokens Tracing
by: Su, Yi, et al.
Published: (2025)

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners
by: Singh, Harman, et al.
Published: (2026)

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
by: He, Yefei, et al.
Published: (2024)

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
by: Liu, Zirui, et al.
Published: (2024)

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
by: Zhang, Muru, et al.
Published: (2025)

Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
by: Liu, Peiyu, et al.
Published: (2024)

RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
by: Su, Zunhai, et al.
Published: (2025)

LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
by: Xiong, Yi, et al.
Published: (2024)

Data Diversification Methods In Alignment Enhance Math Performance In LLMs
by: Dokmeci, Berkan, et al.
Published: (2025)

KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache
by: Li, Fei, et al.
Published: (2025)

ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
by: An, Yongqi, et al.
Published: (2026)

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks
by: Shen, Hanzhang, et al.
Published: (2026)

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
by: Yang, Yibo, et al.
Published: (2024)

MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
by: Wang, Jinguang, et al.
Published: (2025)

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution
by: Maheswaran, Monishwaran, et al.
Published: (2026)

SQuat: Subspace-orthogonal KV Cache Quantization
by: Wang, Hao, et al.
Published: (2025)

Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
by: Wang, Junlin, et al.
Published: (2025)

CommVQ: Commutative Vector Quantization for KV Cache Compression
by: Li, Junyan, et al.
Published: (2025)

Kitti_to_hdmapping_seq00
by: Chudziński
Published: (2026)

Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
by: Wang, Junlin, et al.
Published: (2025)

AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
by: Tan, Yifan, et al.
Published: (2024)

AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations
by: Tao, Qian, et al.
Published: (2024)

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
by: Swain, Kabir, et al.
Published: (2026)

QAQ: Quality Adaptive Quantization for LLM KV Cache
by: Dong, Shichen, et al.
Published: (2024)

PolarQuant: Quantizing KV Caches with Polar Transformation
by: Han, Insu, et al.
Published: (2025)

Quantization Dominates Rank Reduction for KV-Cache Compression
by: Salfati, Samuel
Published: (2026)

ContiguousKV: Accelerating LLM Prefill with Granularity-Aligned KV Cache Management
by: Zou, Jing, et al.
Published: (2026)

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
by: Li, Zhikai, et al.
Published: (2026)

RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
by: Zhang, Junkai, et al.
Published: (2026)

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
by: Ahn, Jinwoo, et al.
Published: (2026)