Saved in:
| Main Authors: | Zheng, Zhen, Song, Xiaonan, Liu, Chuanjie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.14590 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MixLLM: Dynamic Routing in Mixed Large Language Models
by: Wang, Xinyuan, et al.
Published: (2025)
by: Wang, Xinyuan, et al.
Published: (2025)
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
by: Zheng, Zhen, et al.
Published: (2024)
by: Zheng, Zhen, et al.
Published: (2024)
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
by: Duanmu, Haojie, et al.
Published: (2025)
by: Duanmu, Haojie, et al.
Published: (2025)
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
by: Kim, Han-Byul, et al.
Published: (2023)
by: Kim, Han-Byul, et al.
Published: (2023)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
by: Huang, Wei, et al.
Published: (2024)
by: Huang, Wei, et al.
Published: (2024)
GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
by: Deng, Jianing, et al.
Published: (2026)
by: Deng, Jianing, et al.
Published: (2026)
RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference
by: Gautam, Arpit Singh, et al.
Published: (2026)
by: Gautam, Arpit Singh, et al.
Published: (2026)
OMPQ: Orthogonal Mixed Precision Quantization
by: Ma, Yuexiao, et al.
Published: (2021)
by: Ma, Yuexiao, et al.
Published: (2021)
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)
by: Li, Xing, et al.
Published: (2025)
InfoQ: Mixed-Precision Quantization via Global Information Flow
by: Akbulut, Mehmet Emre, et al.
Published: (2025)
by: Akbulut, Mehmet Emre, et al.
Published: (2025)
Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs
by: Ling, Tianheng, et al.
Published: (2024)
by: Ling, Tianheng, et al.
Published: (2024)
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
by: Hooper, Coleman, et al.
Published: (2025)
by: Hooper, Coleman, et al.
Published: (2025)
MixLM: High-Throughput and Effective LLM Ranking via Text-Embedding Mix-Interaction
by: Li, Guoyao, et al.
Published: (2025)
by: Li, Guoyao, et al.
Published: (2025)
MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
by: Gong, Junfeng, et al.
Published: (2024)
by: Gong, Junfeng, et al.
Published: (2024)
AutoQRA: Joint Optimization of Mixed-Precision Quantization and Low-rank Adapters for Efficient LLM Fine-Tuning
by: Zhou, Changhai, et al.
Published: (2026)
by: Zhou, Changhai, et al.
Published: (2026)
BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference
by: Jang, Wonsuk, et al.
Published: (2025)
by: Jang, Wonsuk, et al.
Published: (2025)
Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)
by: Cao, Zeyu, et al.
Published: (2024)
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
by: Lee, Sangjun, et al.
Published: (2025)
by: Lee, Sangjun, et al.
Published: (2025)
Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
by: Zhu, Qingcheng, et al.
Published: (2025)
by: Zhu, Qingcheng, et al.
Published: (2025)
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
by: Liu, Wenyuan, et al.
Published: (2025)
by: Liu, Wenyuan, et al.
Published: (2025)
Scalify: scale propagation for efficient low-precision LLM training
by: Balança, Paul, et al.
Published: (2024)
by: Balança, Paul, et al.
Published: (2024)
OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
by: Li, Zhikai, et al.
Published: (2026)
by: Li, Zhikai, et al.
Published: (2026)
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
by: Xu, Yinggan, et al.
Published: (2026)
by: Xu, Yinggan, et al.
Published: (2026)
Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment
by: Tan, Qitao, et al.
Published: (2026)
by: Tan, Qitao, et al.
Published: (2026)
PQCache: Product Quantization-based KVCache for Long Context LLM Inference
by: Zhang, Hailin, et al.
Published: (2024)
by: Zhang, Hailin, et al.
Published: (2024)
SqueezeLLM: Dense-and-Sparse Quantization
by: Kim, Sehoon, et al.
Published: (2023)
by: Kim, Sehoon, et al.
Published: (2023)
OTLP: Output Thresholding Using Mixed Integer Linear Programming
by: Koseoglu, Baran, et al.
Published: (2024)
by: Koseoglu, Baran, et al.
Published: (2024)
LLM-based AI Agent for Sizing of Analog and Mixed Signal Circuit
by: Liu, Chang, et al.
Published: (2025)
by: Liu, Chang, et al.
Published: (2025)
The Impact of Language Mixing on Bilingual LLM Reasoning
by: Li, Yihao, et al.
Published: (2025)
by: Li, Yihao, et al.
Published: (2025)
On LLM-Enhanced Mixed-Type Data Imputation with High-Order Message Passing
by: Wang, Jianwei, et al.
Published: (2025)
by: Wang, Jianwei, et al.
Published: (2025)
Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)
by: Chen, Hao Mark, et al.
Published: (2024)
Sample-efficient LLM Optimization with Reset Replay
by: Liu, Zichuan, et al.
Published: (2025)
by: Liu, Zichuan, et al.
Published: (2025)
Efficient Mixed Precision Quantization in Graph Neural Networks
by: Moustafa, Samir, et al.
Published: (2025)
by: Moustafa, Samir, et al.
Published: (2025)
AMED: Automatic Mixed-Precision Quantization for Edge Devices
by: Kimhi, Moshe, et al.
Published: (2022)
by: Kimhi, Moshe, et al.
Published: (2022)
MoPEQ: Mixture of Mixed Precision Quantized Experts
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation
by: Li, Pingzhi, et al.
Published: (2025)
by: Li, Pingzhi, et al.
Published: (2025)
EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit
by: Liu, Chang, et al.
Published: (2025)
by: Liu, Chang, et al.
Published: (2025)
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization
by: Kim, Daeun, et al.
Published: (2025)
by: Kim, Daeun, et al.
Published: (2025)
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)
by: Xie, Xilong, et al.
Published: (2025)
Similar Items
-
MixLLM: Dynamic Routing in Mixed Large Language Models
by: Wang, Xinyuan, et al.
Published: (2025) -
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
by: Zheng, Zhen, et al.
Published: (2024) -
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024) -
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
by: Duanmu, Haojie, et al.
Published: (2025) -
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
by: Kim, Han-Byul, et al.
Published: (2023)