:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Junhui, Wu, Shangyu, Wen, Weidong, Xue, Chun Jason, Li, Qingan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2409.01366
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference
by: Taniguchi, Rei, et al.
Published: (2026)

EvoP: Robust LLM Inference via Evolutionary Pruning
by: Wu, Shangyu, et al.
Published: (2025)

GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs
by: Sattarifard, Amirmohsen, et al.
Published: (2025)

KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024)

POP: Prefill-Only Pruning for Efficient Large Model Inference
by: He, Junhui, et al.
Published: (2026)

KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)

SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs
by: Czakó, Patrik, et al.
Published: (2025)

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
by: Yadav, Prateek, et al.
Published: (2023)

The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
by: Zhou, Yefan, et al.
Published: (2026)

Language Model Prompt Selection via Simulation Optimization
by: Zhang, Haoting, et al.
Published: (2024)

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)

Tree Matching Networks for Natural Language Inference: Parameter-Efficient Semantic Understanding via Dependency Parse Trees
by: Lunder, Jason
Published: (2025)

Less is More: Improving LLM Alignment via Preference Data Selection
by: Deng, Xun, et al.
Published: (2025)

Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
by: Wen, Zhuofan, et al.
Published: (2024)

Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
by: Bohne, Jason, et al.
Published: (2025)

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)

On the Compressibility of Quantized Large Language Models
by: Mao, Yu, et al.
Published: (2024)

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
by: Guo, Song, et al.
Published: (2024)

CHESS: Contextual Harnessing for Efficient SQL Synthesis
by: Talaei, Shayan, et al.
Published: (2024)

Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
by: Sim, Woo Seob, et al.
Published: (2026)

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)

Membership Inference Attacks on LLM-based Recommender Systems
by: He, Jiajie, et al.
Published: (2025)

Larger or Smaller Reward Margins to Select Preferences for Alignment?
by: Huang, Kexin, et al.
Published: (2025)

Diagnosing Training Inference Mismatch in LLM Reinforcement Learning
by: Zhong, Tianle, et al.
Published: (2026)

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
by: Huang, Audrey, et al.
Published: (2024)

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
by: Qiao, Aurick, et al.
Published: (2024)

AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)

Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information
by: Wang, Yanshu, et al.
Published: (2024)

Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
by: Chen, Zhipeng, et al.
Published: (2026)

Fibration Policy Optimization
by: Li, Chang, et al.
Published: (2026)

One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging
by: Luo, Yingfeng, et al.
Published: (2025)

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)

Filter-then-Weight: Online Data Selection and Reweighting for LLM Fine-Tuning
by: Wang, Fangxin, et al.
Published: (2026)

LLM-Select: Feature Selection with Large Language Models
by: Jeong, Daniel P., et al.
Published: (2024)

FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
by: Du, Hongchao, et al.
Published: (2025)

LPCD: Unified Framework from Layer-Wise to Submodule Quantization
by: Ichikawa, Yuma, et al.
Published: (2025)

Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging
by: Ju, Yiming, et al.
Published: (2024)

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
by: Zou, Heming, et al.
Published: (2025)