Saved in:
| Main Authors: | Yang, Jaewoo, Kim, Hayun, Kim, Younghoon |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.14428 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026)
by: Zhao, Ziyu, et al.
Published: (2026)
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
by: Nrusimha, Aniruddha, et al.
Published: (2024)
by: Nrusimha, Aniruddha, et al.
Published: (2024)
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
by: Son, Seungwoo, et al.
Published: (2024)
by: Son, Seungwoo, et al.
Published: (2024)
GLU Attention Improve Transformer
by: Wang, Zehao
Published: (2025)
by: Wang, Zehao
Published: (2025)
LQER: Low-Rank Quantization Error Reconstruction for LLMs
by: Zhang, Cheng, et al.
Published: (2024)
by: Zhang, Cheng, et al.
Published: (2024)
GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models
by: Gerstner, Sebastian, et al.
Published: (2026)
by: Gerstner, Sebastian, et al.
Published: (2026)
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
by: Ahn, Jaewoo, et al.
Published: (2025)
by: Ahn, Jaewoo, et al.
Published: (2025)
Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation
by: Jantsch, Lasse Marten, et al.
Published: (2026)
by: Jantsch, Lasse Marten, et al.
Published: (2026)
Dependency-Aware Semi-Structured Sparsity of GLU Variants in Large Language Models
by: Guo, Zhiyu, et al.
Published: (2024)
by: Guo, Zhiyu, et al.
Published: (2024)
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
by: Shen, Xuan, et al.
Published: (2023)
by: Shen, Xuan, et al.
Published: (2023)
Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
by: Xiong, Boya, et al.
Published: (2025)
by: Xiong, Boya, et al.
Published: (2025)
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
by: Kumar, Abhay, et al.
Published: (2025)
by: Kumar, Abhay, et al.
Published: (2025)
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
by: Guo, Han, et al.
Published: (2024)
by: Guo, Han, et al.
Published: (2024)
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
by: Park, Jungwoo, et al.
Published: (2025)
by: Park, Jungwoo, et al.
Published: (2025)
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
by: Huang, Xijie, et al.
Published: (2024)
by: Huang, Xijie, et al.
Published: (2024)
L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models
by: Jeon, Hyesung, et al.
Published: (2024)
by: Jeon, Hyesung, et al.
Published: (2024)
Learning to Correct for QA Reasoning with Black-box LLMs
by: Kim, Jaehyung, et al.
Published: (2024)
by: Kim, Jaehyung, et al.
Published: (2024)
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
by: Ouyang, Xu, et al.
Published: (2024)
by: Ouyang, Xu, et al.
Published: (2024)
Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition
by: Liu, Ziyang
Published: (2026)
by: Liu, Ziyang
Published: (2026)
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
by: Kim, Jeonghye, et al.
Published: (2026)
by: Kim, Jeonghye, et al.
Published: (2026)
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
by: Song, Jiwon, et al.
Published: (2024)
by: Song, Jiwon, et al.
Published: (2024)
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
by: Kim, Taesu, et al.
Published: (2024)
by: Kim, Taesu, et al.
Published: (2024)
A More Accurate Approximation of Activation Function with Few Spikes Neurons
by: Jeong, Dayena, et al.
Published: (2024)
by: Jeong, Dayena, et al.
Published: (2024)
On the Importance of a Multi-Scale Calibration for Quantization
by: Son, Seungwoo, et al.
Published: (2026)
by: Son, Seungwoo, et al.
Published: (2026)
QEFT: Quantization for Efficient Fine-Tuning of LLMs
by: Lee, Changhun, et al.
Published: (2024)
by: Lee, Changhun, et al.
Published: (2024)
How Does Quantization Affect Multilingual LLMs?
by: Marchisio, Kelly, et al.
Published: (2024)
by: Marchisio, Kelly, et al.
Published: (2024)
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
by: Lee, Banseok, et al.
Published: (2025)
by: Lee, Banseok, et al.
Published: (2025)
Turning LLM Activations Quantization-Friendly
by: Czakó, Patrik, et al.
Published: (2025)
by: Czakó, Patrik, et al.
Published: (2025)
Few-shot Personalization of LLMs with Mis-aligned Responses
by: Kim, Jaehyung, et al.
Published: (2024)
by: Kim, Jaehyung, et al.
Published: (2024)
Mitigating Bias in RAG: Controlling the Embedder
by: Kim, Taeyoun, et al.
Published: (2025)
by: Kim, Taeyoun, et al.
Published: (2025)
Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task
by: Lee, Hoonick, et al.
Published: (2024)
by: Lee, Hoonick, et al.
Published: (2024)
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
by: Qin, Haotong, et al.
Published: (2024)
by: Qin, Haotong, et al.
Published: (2024)
Subgraph-level Universal Prompt Tuning
by: Lee, Junhyun, et al.
Published: (2024)
by: Lee, Junhyun, et al.
Published: (2024)
Interpreting and Mitigating Unwanted Uncertainty in LLMs
by: Roy, Tiasa Singha, et al.
Published: (2025)
by: Roy, Tiasa Singha, et al.
Published: (2025)
Interpreting the Effects of Quantization on LLMs
by: Singh, Manpreet, et al.
Published: (2025)
by: Singh, Manpreet, et al.
Published: (2025)
GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
by: Deng, Jianing, et al.
Published: (2026)
by: Deng, Jianing, et al.
Published: (2026)
Rethinking State Tracking in Recurrent Models Through Error Control Dynamics
by: Chung, Jiwan, et al.
Published: (2026)
by: Chung, Jiwan, et al.
Published: (2026)
SqueezeLLM: Dense-and-Sparse Quantization
by: Kim, Sehoon, et al.
Published: (2023)
by: Kim, Sehoon, et al.
Published: (2023)
Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators
by: Roytburg, Dani, et al.
Published: (2025)
by: Roytburg, Dani, et al.
Published: (2025)
AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
by: IslamBouli, Beshr, et al.
Published: (2026)
by: IslamBouli, Beshr, et al.
Published: (2026)
Similar Items
-
ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026) -
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
by: Nrusimha, Aniruddha, et al.
Published: (2024) -
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
by: Son, Seungwoo, et al.
Published: (2024) -
GLU Attention Improve Transformer
by: Wang, Zehao
Published: (2025) -
LQER: Low-Rank Quantization Error Reconstruction for LLMs
by: Zhang, Cheng, et al.
Published: (2024)