Saved in:
| Main Authors: | Zhang, Cheng, Cheng, Jianyi, Constantinides, George A., Zhao, Yiren |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.02446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024)
by: Zhang, Zixi, et al.
Published: (2024)
QERA: an Analytical Framework for Quantization Error Reconstruction
by: Zhang, Cheng, et al.
Published: (2024)
by: Zhang, Cheng, et al.
Published: (2024)
A3 : an Analytical Low-Rank Approximation Framework for Attention
by: Wong, Jeffrey T. H., et al.
Published: (2025)
by: Wong, Jeffrey T. H., et al.
Published: (2025)
Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)
by: Cao, Zeyu, et al.
Published: (2024)
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)
by: Zhang, Cheng, et al.
Published: (2023)
Low-Rank Quantization-Aware Training for LLMs
by: Bondarenko, Yelysei, et al.
Published: (2024)
by: Bondarenko, Yelysei, et al.
Published: (2024)
LoQT: Low-Rank Adapters for Quantized Pretraining
by: Loeschcke, Sebastian, et al.
Published: (2024)
by: Loeschcke, Sebastian, et al.
Published: (2024)
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
by: Ouyang, Xu, et al.
Published: (2024)
by: Ouyang, Xu, et al.
Published: (2024)
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
by: Yang, Jaewoo, et al.
Published: (2024)
by: Yang, Jaewoo, et al.
Published: (2024)
LCQ: Low-Rank Codebook based Quantization for Large Language Models
by: Cai, Wen-Pu, et al.
Published: (2024)
by: Cai, Wen-Pu, et al.
Published: (2024)
AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks
by: Gimenes, Pedro, et al.
Published: (2025)
by: Gimenes, Pedro, et al.
Published: (2025)
Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
by: Hajimolahoseini, Habib, et al.
Published: (2023)
by: Hajimolahoseini, Habib, et al.
Published: (2023)
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
by: Zhang, Rongzhi, et al.
Published: (2024)
by: Zhang, Rongzhi, et al.
Published: (2024)
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
by: Saxena, Utkarsh, et al.
Published: (2024)
by: Saxena, Utkarsh, et al.
Published: (2024)
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
by: Song, Guanghui, et al.
Published: (2025)
by: Song, Guanghui, et al.
Published: (2025)
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
by: Cheng, Wenhua, et al.
Published: (2023)
by: Cheng, Wenhua, et al.
Published: (2023)
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
by: Xiong, Boya, et al.
Published: (2025)
by: Xiong, Boya, et al.
Published: (2025)
On the Existence and Behavior of Secondary Attention Sinks
by: Wong, Jeffrey T. H., et al.
Published: (2025)
by: Wong, Jeffrey T. H., et al.
Published: (2025)
QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning
by: Rajabzadeh, Hossein, et al.
Published: (2024)
by: Rajabzadeh, Hossein, et al.
Published: (2024)
LoRMA: Low-Rank Multiplicative Adaptation for LLMs
by: Bihany, Harsh, et al.
Published: (2025)
by: Bihany, Harsh, et al.
Published: (2025)
Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
by: Sengupta, Ayan, et al.
Published: (2024)
by: Sengupta, Ayan, et al.
Published: (2024)
Optimised Grouped-Query Attention Mechanism for Transformers
by: Chen, Yuang, et al.
Published: (2024)
by: Chen, Yuang, et al.
Published: (2024)
Low-Rank Adaptation for Multilingual Summarization: An Empirical Study
by: Whitehouse, Chenxi, et al.
Published: (2023)
by: Whitehouse, Chenxi, et al.
Published: (2023)
Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition
by: Cho, Yoonjun, et al.
Published: (2025)
by: Cho, Yoonjun, et al.
Published: (2025)
An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning
by: Li, Cen-Jhih, et al.
Published: (2025)
by: Li, Cen-Jhih, et al.
Published: (2025)
GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
by: Dadgarnia, Alireza, et al.
Published: (2026)
by: Dadgarnia, Alireza, et al.
Published: (2026)
Understanding and Mitigating Errors of LLM-Generated RTL Code
by: Zhang, Jiazheng, et al.
Published: (2025)
by: Zhang, Jiazheng, et al.
Published: (2025)
Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs
by: Cho, Yoonjun, et al.
Published: (2026)
by: Cho, Yoonjun, et al.
Published: (2026)
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
by: Huang, Xijie, et al.
Published: (2024)
by: Huang, Xijie, et al.
Published: (2024)
FrameQuant: Flexible Low-Bit Quantization for Transformers
by: Adepu, Harshavardhan, et al.
Published: (2024)
by: Adepu, Harshavardhan, et al.
Published: (2024)
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025)
by: Bose, Avinandan, et al.
Published: (2025)
QEFT: Quantization for Efficient Fine-Tuning of LLMs
by: Lee, Changhun, et al.
Published: (2024)
by: Lee, Changhun, et al.
Published: (2024)
How Does Quantization Affect Multilingual LLMs?
by: Marchisio, Kelly, et al.
Published: (2024)
by: Marchisio, Kelly, et al.
Published: (2024)
Quantization-Robust LLM Unlearning via Low-Rank Adaptation
by: Abitante, João Vitor Boer, et al.
Published: (2026)
by: Abitante, João Vitor Boer, et al.
Published: (2026)
Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)
by: Cheng, Yun, et al.
Published: (2026)
BiSup: Bidirectional Quantization Error Suppression for Large Language Models
by: Zou, Minghui, et al.
Published: (2024)
by: Zou, Minghui, et al.
Published: (2024)
Optimizing Large Language Model Training Using FP4 Quantization
by: Wang, Ruizhe, et al.
Published: (2025)
by: Wang, Ruizhe, et al.
Published: (2025)
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
by: Wang, Xinyi, et al.
Published: (2025)
by: Wang, Xinyi, et al.
Published: (2025)
MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
by: Tang, Chuanyu, et al.
Published: (2024)
by: Tang, Chuanyu, et al.
Published: (2024)
GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection
by: Liao, Xutao, et al.
Published: (2024)
by: Liao, Xutao, et al.
Published: (2024)
Similar Items
-
Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024) -
QERA: an Analytical Framework for Quantization Error Reconstruction
by: Zhang, Cheng, et al.
Published: (2024) -
A3 : an Analytical Low-Rank Approximation Framework for Attention
by: Wong, Jeffrey T. H., et al.
Published: (2025) -
Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024) -
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)