Saved in:
| Main Authors: | Wang, Yanshu, He, Wenyang, Yang, Tong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.17470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Interactions Across Blocks in Post-Training Quantization of Large Language Models
by: Shabanovi, Khasmamad, et al.
Published: (2024)
by: Shabanovi, Khasmamad, et al.
Published: (2024)
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
by: Xiao, Guangxuan, et al.
Published: (2022)
by: Xiao, Guangxuan, et al.
Published: (2022)
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
by: Chen, Mengzhao, et al.
Published: (2024)
by: Chen, Mengzhao, et al.
Published: (2024)
Task-Stratified Knowledge Scaling Laws for Post-Training Quantized Large Language Models
by: Zhou, Chenxi, et al.
Published: (2025)
by: Zhou, Chenxi, et al.
Published: (2025)
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
by: Guan, Ziyi, et al.
Published: (2024)
by: Guan, Ziyi, et al.
Published: (2024)
Binary Autoencoder for Mechanistic Interpretability of Large Language Models
by: Cho, Hakaze, et al.
Published: (2025)
by: Cho, Hakaze, et al.
Published: (2025)
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models
by: Gu, Kang, et al.
Published: (2024)
by: Gu, Kang, et al.
Published: (2024)
The Role of Diversity in In-Context Learning for Large Language Models
by: Xiao, Wenyang, et al.
Published: (2025)
by: Xiao, Wenyang, et al.
Published: (2025)
Efficient Post-training Quantization with FP8 Formats
by: Shen, Haihao, et al.
Published: (2023)
by: Shen, Haihao, et al.
Published: (2023)
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models
by: Xiao, He, et al.
Published: (2025)
by: Xiao, He, et al.
Published: (2025)
Channel-Wise Mixed-Precision Quantization for Large Language Models
by: Chen, Zihan, et al.
Published: (2024)
by: Chen, Zihan, et al.
Published: (2024)
Sliding Window Attention Training for Efficient Large Language Models
by: Fu, Zichuan, et al.
Published: (2025)
by: Fu, Zichuan, et al.
Published: (2025)
SiLQ: Simple Large Language Model Quantization-Aware Training
by: Esser, Steven K., et al.
Published: (2025)
by: Esser, Steven K., et al.
Published: (2025)
AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning
by: Yang, Yifan, et al.
Published: (2024)
by: Yang, Yifan, et al.
Published: (2024)
An Efficient Inference Framework for Early-exit Large Language Models
by: Miao, Ruijie, et al.
Published: (2024)
by: Miao, Ruijie, et al.
Published: (2024)
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
by: Liu, Jing, et al.
Published: (2023)
by: Liu, Jing, et al.
Published: (2023)
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
by: Zhang, Jun, et al.
Published: (2025)
by: Zhang, Jun, et al.
Published: (2025)
Towards a Unified View of Large Language Model Post-Training
by: Lv, Xingtai, et al.
Published: (2025)
by: Lv, Xingtai, et al.
Published: (2025)
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
by: Li, Yanshu
Published: (2025)
by: Li, Yanshu
Published: (2025)
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
by: Park, Jungwoo, et al.
Published: (2025)
by: Park, Jungwoo, et al.
Published: (2025)
On the Compressibility of Quantized Large Language Models
by: Mao, Yu, et al.
Published: (2024)
by: Mao, Yu, et al.
Published: (2024)
EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs
by: Guo, Song, et al.
Published: (2024)
by: Guo, Song, et al.
Published: (2024)
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
by: Huang, Wei, et al.
Published: (2024)
by: Huang, Wei, et al.
Published: (2024)
LPCD: Unified Framework from Layer-Wise to Submodule Quantization
by: Ichikawa, Yuma, et al.
Published: (2025)
by: Ichikawa, Yuma, et al.
Published: (2025)
Athena: Retrieval-augmented Legal Judgment Prediction with Large Language Models
by: Peng, Xiao, et al.
Published: (2024)
by: Peng, Xiao, et al.
Published: (2024)
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024)
by: Hu, Xing, et al.
Published: (2024)
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
by: Wang, Yanshu, et al.
Published: (2024)
by: Wang, Yanshu, et al.
Published: (2024)
BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models
by: He, Liulu, et al.
Published: (2025)
by: He, Liulu, et al.
Published: (2025)
Pushing the Limits of Block Rotations in Post-Training Quantization
by: Sanjeet, Sai, et al.
Published: (2026)
by: Sanjeet, Sai, et al.
Published: (2026)
Post Training Quantization of Large Language Models with Microscaling Formats
by: Sharify, Sayeh, et al.
Published: (2024)
by: Sharify, Sayeh, et al.
Published: (2024)
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)
by: Li, Xing, et al.
Published: (2025)
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
by: Xiao, He, et al.
Published: (2025)
by: Xiao, He, et al.
Published: (2025)
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
by: Li, Pingzhi, et al.
Published: (2024)
by: Li, Pingzhi, et al.
Published: (2024)
SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs
by: Czakó, Patrik, et al.
Published: (2025)
by: Czakó, Patrik, et al.
Published: (2025)
Mapping Post-Training Forgetting in Language Models at Scale
by: Harmon, Jackson, et al.
Published: (2025)
by: Harmon, Jackson, et al.
Published: (2025)
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
by: Zheng, Xingyu, et al.
Published: (2025)
by: Zheng, Xingyu, et al.
Published: (2025)
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
by: Zhao, Shiwan, et al.
Published: (2026)
by: Zhao, Shiwan, et al.
Published: (2026)
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
by: Sun, Hao, et al.
Published: (2025)
by: Sun, Hao, et al.
Published: (2025)
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
by: Cao, Sheng, et al.
Published: (2025)
by: Cao, Sheng, et al.
Published: (2025)
Regurgitative Training: The Value of Real Data in Training Large Language Models
by: Zhang, Jinghui, et al.
Published: (2024)
by: Zhang, Jinghui, et al.
Published: (2024)
Similar Items
-
Interactions Across Blocks in Post-Training Quantization of Large Language Models
by: Shabanovi, Khasmamad, et al.
Published: (2024) -
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
by: Xiao, Guangxuan, et al.
Published: (2022) -
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
by: Chen, Mengzhao, et al.
Published: (2024) -
Task-Stratified Knowledge Scaling Laws for Post-Training Quantized Large Language Models
by: Zhou, Chenxi, et al.
Published: (2025) -
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
by: Guan, Ziyi, et al.
Published: (2024)