Saved in:
| Main Authors: | Lu, Jun, Xu, Tianyi, Ding, Bill, Li, David, Kang, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.17101 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores
by: Lu, Jun, et al.
Published: (2024)
by: Lu, Jun, et al.
Published: (2024)
CURing Large Models: Compression via CUR Decomposition
by: Park, Sanghyeon, et al.
Published: (2025)
by: Park, Sanghyeon, et al.
Published: (2025)
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
by: Kang, Zilin, et al.
Published: (2025)
by: Kang, Zilin, et al.
Published: (2025)
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Activation Sparsity Opportunities for Compressing General Large Language Models
by: Dhar, Nobel, et al.
Published: (2024)
by: Dhar, Nobel, et al.
Published: (2024)
SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
by: Li, Ziwei, et al.
Published: (2026)
by: Li, Ziwei, et al.
Published: (2026)
TensorGPT: Efficient Compression of Large Language Models based on Tensor-Train Decomposition
by: Xu, Mingxue, et al.
Published: (2023)
by: Xu, Mingxue, et al.
Published: (2023)
BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression
by: González-Martínez, David
Published: (2025)
by: González-Martínez, David
Published: (2025)
Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
by: Leask, Patrick, et al.
Published: (2025)
by: Leask, Patrick, et al.
Published: (2025)
Sparse Gradient Compression for Fine-Tuning Large Language Models
by: Yang, David H., et al.
Published: (2025)
by: Yang, David H., et al.
Published: (2025)
Compressing Large Language Models using Low Rank and Low Precision Decomposition
by: Saha, Rajarshi, et al.
Published: (2024)
by: Saha, Rajarshi, et al.
Published: (2024)
AgentCompress: Task-Aware Compression for Affordable Large Language Model Agents
by: Taha, Zuhair Ahmed Khan, et al.
Published: (2026)
by: Taha, Zuhair Ahmed Khan, et al.
Published: (2026)
ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity
by: Li, Jiaxi, et al.
Published: (2026)
by: Li, Jiaxi, et al.
Published: (2026)
Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models
by: Rauba, Paulius, et al.
Published: (2025)
by: Rauba, Paulius, et al.
Published: (2025)
CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression
by: Kautsar, Muchammad Daniyal, et al.
Published: (2025)
by: Kautsar, Muchammad Daniyal, et al.
Published: (2025)
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
by: Wang, Fei, et al.
Published: (2025)
by: Wang, Fei, et al.
Published: (2025)
FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment
by: Zaccone, Riccardo, et al.
Published: (2026)
by: Zaccone, Riccardo, et al.
Published: (2026)
NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN
by: Xie, Jianhang, et al.
Published: (2025)
by: Xie, Jianhang, et al.
Published: (2025)
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
by: Chen, Sihan, et al.
Published: (2025)
by: Chen, Sihan, et al.
Published: (2025)
Activation Map Compression through Tensor Decomposition for Deep Learning
by: Nguyen, Le-Trung, et al.
Published: (2024)
by: Nguyen, Le-Trung, et al.
Published: (2024)
HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning
by: Gupta, Nikunj, et al.
Published: (2025)
by: Gupta, Nikunj, et al.
Published: (2025)
To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
by: Yang, Zeyu, et al.
Published: (2025)
by: Yang, Zeyu, et al.
Published: (2025)
MoDeGPT: Modular Decomposition for Large Language Model Compression
by: Lin, Chi-Heng, et al.
Published: (2024)
by: Lin, Chi-Heng, et al.
Published: (2024)
Matrix Decomposition and Applications
by: Lu, Jun
Published: (2022)
by: Lu, Jun
Published: (2022)
Evo: Autoregressive-Diffusion Large Language Models with Evolving Balance
by: Wu, Junde, et al.
Published: (2026)
by: Wu, Junde, et al.
Published: (2026)
On the Compressibility of Quantized Large Language Models
by: Mao, Yu, et al.
Published: (2024)
by: Mao, Yu, et al.
Published: (2024)
Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction
by: Mao, Yu, et al.
Published: (2025)
by: Mao, Yu, et al.
Published: (2025)
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
by: Xia, Guoyang, et al.
Published: (2025)
by: Xia, Guoyang, et al.
Published: (2025)
Extreme Compression of Large Language Models via Additive Quantization
by: Egiazarian, Vage, et al.
Published: (2024)
by: Egiazarian, Vage, et al.
Published: (2024)
Shuttle Between the Instructions and the Parameters of Large Language Models
by: Sun, Wangtao, et al.
Published: (2025)
by: Sun, Wangtao, et al.
Published: (2025)
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
by: Zhang, Tianyi, et al.
Published: (2024)
by: Zhang, Tianyi, et al.
Published: (2024)
Sink-Aware Pruning for Diffusion Language Models
by: Myrzakhan, Aidar, et al.
Published: (2026)
by: Myrzakhan, Aidar, et al.
Published: (2026)
Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
by: Tang, Yuntian, et al.
Published: (2026)
by: Tang, Yuntian, et al.
Published: (2026)
Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models
by: Gupta, Rishaank
Published: (2026)
by: Gupta, Rishaank
Published: (2026)
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)
by: Zhao, Youpeng, et al.
Published: (2024)
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
by: Zhang, Tianyi, et al.
Published: (2024)
by: Zhang, Tianyi, et al.
Published: (2024)
MadEvolve: Evolutionary Optimization of Cosmological Algorithms with Large Language Models
by: Li, Tianyi, et al.
Published: (2026)
by: Li, Tianyi, et al.
Published: (2026)
Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks
by: Kalle, Alper, et al.
Published: (2025)
by: Kalle, Alper, et al.
Published: (2025)
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
by: Wang, Jingcun, et al.
Published: (2024)
by: Wang, Jingcun, et al.
Published: (2024)
Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey
by: Xu, Ruiyao, et al.
Published: (2024)
by: Xu, Ruiyao, et al.
Published: (2024)
Similar Items
-
Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores
by: Lu, Jun, et al.
Published: (2024) -
CURing Large Models: Compression via CUR Decomposition
by: Park, Sanghyeon, et al.
Published: (2025) -
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
by: Kang, Zilin, et al.
Published: (2025) -
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
by: Wang, Xin, et al.
Published: (2024) -
Activation Sparsity Opportunities for Compressing General Large Language Models
by: Dhar, Nobel, et al.
Published: (2024)