:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Zhen, Song, Xiaonan, Liu, Chuanjie
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2412.14590
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MixLLM: Dynamic Routing in Mixed Large Language Models
by: Wang, Xinyuan, et al.
Published: (2025)

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
by: Zheng, Zhen, et al.
Published: (2024)

MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)

MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
by: Duanmu, Haojie, et al.
Published: (2025)

MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
by: Kim, Han-Byul, et al.
Published: (2023)

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
by: Huang, Wei, et al.
Published: (2024)

GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
by: Deng, Jianing, et al.
Published: (2026)

RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference
by: Gautam, Arpit Singh, et al.
Published: (2026)

OMPQ: Orthogonal Mixed Precision Quantization
by: Ma, Yuexiao, et al.
Published: (2021)

KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)

InfoQ: Mixed-Precision Quantization via Global Information Flow
by: Akbulut, Mehmet Emre, et al.
Published: (2025)

Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs
by: Ling, Tianheng, et al.
Published: (2024)

FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
by: Hooper, Coleman, et al.
Published: (2025)

MixLM: High-Throughput and Effective LLM Ranking via Text-Embedding Mix-Interaction
by: Li, Guoyao, et al.
Published: (2025)

MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
by: Gong, Junfeng, et al.
Published: (2024)

AutoQRA: Joint Optimization of Mixed-Precision Quantization and Low-rank Adapters for Efficient LLM Fine-Tuning
by: Zhou, Changhai, et al.
Published: (2026)

BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference
by: Jang, Wonsuk, et al.
Published: (2025)

Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
by: Lee, Sangjun, et al.
Published: (2025)

Squeeze10-LLM: Squeezing LLMs' Weights by 10 Times via a Staged Mixed-Precision Quantization Method
by: Zhu, Qingcheng, et al.
Published: (2025)

MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
by: Liu, Wenyuan, et al.
Published: (2025)

Scalify: scale propagation for efficient low-precision LLM training
by: Balança, Paul, et al.
Published: (2024)

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
by: Li, Zhikai, et al.
Published: (2026)

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost
by: Xu, Yinggan, et al.
Published: (2026)

Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment
by: Tan, Qitao, et al.
Published: (2026)

PQCache: Product Quantization-based KVCache for Long Context LLM Inference
by: Zhang, Hailin, et al.
Published: (2024)

SqueezeLLM: Dense-and-Sparse Quantization
by: Kim, Sehoon, et al.
Published: (2023)

OTLP: Output Thresholding Using Mixed Integer Linear Programming
by: Koseoglu, Baran, et al.
Published: (2024)

LLM-based AI Agent for Sizing of Analog and Mixed Signal Circuit
by: Liu, Chang, et al.
Published: (2025)

The Impact of Language Mixing on Bilingual LLM Reasoning
by: Li, Yihao, et al.
Published: (2025)

On LLM-Enhanced Mixed-Type Data Imputation with High-Order Message Passing
by: Wang, Jianwei, et al.
Published: (2025)

Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)

Sample-efficient LLM Optimization with Reset Replay
by: Liu, Zichuan, et al.
Published: (2025)

Efficient Mixed Precision Quantization in Graph Neural Networks
by: Moustafa, Samir, et al.
Published: (2025)

AMED: Automatic Mixed-Precision Quantization for Edge Devices
by: Kimhi, Moshe, et al.
Published: (2022)

MoPEQ: Mixture of Mixed Precision Quantized Experts
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)

DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation
by: Li, Pingzhi, et al.
Published: (2025)

EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit
by: Liu, Chang, et al.
Published: (2025)

MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization
by: Kim, Daeun, et al.
Published: (2025)

FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)