Saved in:
| Main Authors: | He, Junhui, Wu, Shangyu, Wen, Weidong, Xue, Chun Jason, Li, Qingan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.01366 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference
by: Taniguchi, Rei, et al.
Published: (2026)
by: Taniguchi, Rei, et al.
Published: (2026)
EvoP: Robust LLM Inference via Evolutionary Pruning
by: Wu, Shangyu, et al.
Published: (2025)
by: Wu, Shangyu, et al.
Published: (2025)
GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs
by: Sattarifard, Amirmohsen, et al.
Published: (2025)
by: Sattarifard, Amirmohsen, et al.
Published: (2025)
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024)
by: Yang, Yifei, et al.
Published: (2024)
POP: Prefill-Only Pruning for Efficient Large Model Inference
by: He, Junhui, et al.
Published: (2026)
by: He, Junhui, et al.
Published: (2026)
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)
by: Li, Xing, et al.
Published: (2025)
SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs
by: Czakó, Patrik, et al.
Published: (2025)
by: Czakó, Patrik, et al.
Published: (2025)
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
by: Yadav, Prateek, et al.
Published: (2023)
by: Yadav, Prateek, et al.
Published: (2023)
The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
by: Zhou, Yefan, et al.
Published: (2026)
by: Zhou, Yefan, et al.
Published: (2026)
Language Model Prompt Selection via Simulation Optimization
by: Zhang, Haoting, et al.
Published: (2024)
by: Zhang, Haoting, et al.
Published: (2024)
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
Tree Matching Networks for Natural Language Inference: Parameter-Efficient Semantic Understanding via Dependency Parse Trees
by: Lunder, Jason
Published: (2025)
by: Lunder, Jason
Published: (2025)
Less is More: Improving LLM Alignment via Preference Data Selection
by: Deng, Xun, et al.
Published: (2025)
by: Deng, Xun, et al.
Published: (2025)
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
by: Wen, Zhuofan, et al.
Published: (2024)
by: Wen, Zhuofan, et al.
Published: (2024)
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
by: Bohne, Jason, et al.
Published: (2025)
by: Bohne, Jason, et al.
Published: (2025)
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)
by: Wu, Wei, et al.
Published: (2024)
On the Compressibility of Quantized Large Language Models
by: Mao, Yu, et al.
Published: (2024)
by: Mao, Yu, et al.
Published: (2024)
EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
by: Guo, Song, et al.
Published: (2024)
by: Guo, Song, et al.
Published: (2024)
CHESS: Contextual Harnessing for Efficient SQL Synthesis
by: Talaei, Shayan, et al.
Published: (2024)
by: Talaei, Shayan, et al.
Published: (2024)
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
by: Sim, Woo Seob, et al.
Published: (2026)
by: Sim, Woo Seob, et al.
Published: (2026)
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
Membership Inference Attacks on LLM-based Recommender Systems
by: He, Jiajie, et al.
Published: (2025)
by: He, Jiajie, et al.
Published: (2025)
Larger or Smaller Reward Margins to Select Preferences for Alignment?
by: Huang, Kexin, et al.
Published: (2025)
by: Huang, Kexin, et al.
Published: (2025)
Diagnosing Training Inference Mismatch in LLM Reinforcement Learning
by: Zhong, Tianle, et al.
Published: (2026)
by: Zhong, Tianle, et al.
Published: (2026)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)
by: Lu, Liming, et al.
Published: (2026)
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
by: Huang, Audrey, et al.
Published: (2024)
by: Huang, Audrey, et al.
Published: (2024)
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
by: Qiao, Aurick, et al.
Published: (2024)
by: Qiao, Aurick, et al.
Published: (2024)
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information
by: Wang, Yanshu, et al.
Published: (2024)
by: Wang, Yanshu, et al.
Published: (2024)
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
by: Chen, Zhipeng, et al.
Published: (2026)
by: Chen, Zhipeng, et al.
Published: (2026)
Fibration Policy Optimization
by: Li, Chang, et al.
Published: (2026)
by: Li, Chang, et al.
Published: (2026)
One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging
by: Luo, Yingfeng, et al.
Published: (2025)
by: Luo, Yingfeng, et al.
Published: (2025)
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
Filter-then-Weight: Online Data Selection and Reweighting for LLM Fine-Tuning
by: Wang, Fangxin, et al.
Published: (2026)
by: Wang, Fangxin, et al.
Published: (2026)
LLM-Select: Feature Selection with Large Language Models
by: Jeong, Daniel P., et al.
Published: (2024)
by: Jeong, Daniel P., et al.
Published: (2024)
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
by: Du, Hongchao, et al.
Published: (2025)
by: Du, Hongchao, et al.
Published: (2025)
LPCD: Unified Framework from Layer-Wise to Submodule Quantization
by: Ichikawa, Yuma, et al.
Published: (2025)
by: Ichikawa, Yuma, et al.
Published: (2025)
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging
by: Ju, Yiming, et al.
Published: (2024)
by: Ju, Yiming, et al.
Published: (2024)
FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
by: Zou, Heming, et al.
Published: (2025)
by: Zou, Heming, et al.
Published: (2025)
Similar Items
-
Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference
by: Taniguchi, Rei, et al.
Published: (2026) -
EvoP: Robust LLM Inference via Evolutionary Pruning
by: Wu, Shangyu, et al.
Published: (2025) -
GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs
by: Sattarifard, Amirmohsen, et al.
Published: (2025) -
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024) -
POP: Prefill-Only Pruning for Efficient Large Model Inference
by: He, Junhui, et al.
Published: (2026)