:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Cheng, Cheng, Jianyi, Constantinides, George A., Zhao, Yiren
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2402.02446
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024)

QERA: an Analytical Framework for Quantization Error Reconstruction
by: Zhang, Cheng, et al.
Published: (2024)

A3 : an Analytical Low-Rank Approximation Framework for Attention
by: Wong, Jeffrey T. H., et al.
Published: (2025)

Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)

Low-Rank Quantization-Aware Training for LLMs
by: Bondarenko, Yelysei, et al.
Published: (2024)

LoQT: Low-Rank Adapters for Quantized Pretraining
by: Loeschcke, Sebastian, et al.
Published: (2024)

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
by: Ouyang, Xu, et al.
Published: (2024)

Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
by: Yang, Jaewoo, et al.
Published: (2024)

LCQ: Low-Rank Codebook based Quantization for Large Language Models
by: Cai, Wen-Pu, et al.
Published: (2024)

AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks
by: Gimenes, Pedro, et al.
Published: (2025)

Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
by: Hajimolahoseini, Habib, et al.
Published: (2023)

LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
by: Zhang, Rongzhi, et al.
Published: (2024)

ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
by: Saxena, Utkarsh, et al.
Published: (2024)

Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
by: Song, Guanghui, et al.
Published: (2025)

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
by: Cheng, Wenhua, et al.
Published: (2023)

Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
by: Xiong, Boya, et al.
Published: (2025)

On the Existence and Behavior of Secondary Attention Sinks
by: Wong, Jeffrey T. H., et al.
Published: (2025)

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning
by: Rajabzadeh, Hossein, et al.
Published: (2024)

LoRMA: Low-Rank Multiplicative Adaptation for LLMs
by: Bihany, Harsh, et al.
Published: (2025)

Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
by: Sengupta, Ayan, et al.
Published: (2024)

Optimised Grouped-Query Attention Mechanism for Transformers
by: Chen, Yuang, et al.
Published: (2024)

Low-Rank Adaptation for Multilingual Summarization: An Empirical Study
by: Whitehouse, Chenxi, et al.
Published: (2023)

Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition
by: Cho, Yoonjun, et al.
Published: (2025)

An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning
by: Li, Cen-Jhih, et al.
Published: (2025)

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
by: Dadgarnia, Alireza, et al.
Published: (2026)

Understanding and Mitigating Errors of LLM-Generated RTL Code
by: Zhang, Jiazheng, et al.
Published: (2025)

Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs
by: Cho, Yoonjun, et al.
Published: (2026)

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
by: Huang, Xijie, et al.
Published: (2024)

FrameQuant: Flexible Low-Bit Quantization for Transformers
by: Adepu, Harshavardhan, et al.
Published: (2024)

LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025)

QEFT: Quantization for Efficient Fine-Tuning of LLMs
by: Lee, Changhun, et al.
Published: (2024)

How Does Quantization Affect Multilingual LLMs?
by: Marchisio, Kelly, et al.
Published: (2024)

Quantization-Robust LLM Unlearning via Low-Rank Adaptation
by: Abitante, João Vitor Boer, et al.
Published: (2026)

Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)

BiSup: Bidirectional Quantization Error Suppression for Large Language Models
by: Zou, Minghui, et al.
Published: (2024)

Optimizing Large Language Model Training Using FP4 Quantization
by: Wang, Ruizhe, et al.
Published: (2025)

FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
by: Wang, Xinyi, et al.
Published: (2025)

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
by: Tang, Chuanyu, et al.
Published: (2024)

GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection
by: Liao, Xutao, et al.
Published: (2024)