:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gong, Ruihao, Ding, Yifu, Wang, Zining, Lv, Chengtao, Zheng, Xingyu, Du, Jinyang, Qin, Haotong, Guo, Jinyang, Magno, Michele, Liu, Xianglong
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2409.16694
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PTQ4SAM: Post-Training Quantization for Segment Anything
by: Lv, Chengtao, et al.
Published: (2024)

MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
by: Huang, Yushi, et al.
Published: (2025)

First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
by: Zheng, Xingyu, et al.
Published: (2025)

QVGen: Pushing the Limit of Quantized Video Generative Models
by: Huang, Yushi, et al.
Published: (2025)

DB-LLM: Accurate Dual-Binarization for Efficient LLMs
by: Chen, Hong, et al.
Published: (2024)

QuantSR+: Pushing the Limit of Quantized Image Super-Resolution Networks
by: Qin, Haotong, et al.
Published: (2026)

PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
by: Wnag, Zining, et al.
Published: (2024)

Collaborative Few-Step Distillation and Low-Bit Quantization for Wan2.2 Dual-Expert Video Diffusion Models
by: Du, Jinyang, et al.
Published: (2026)

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
by: Yang, Ge, et al.
Published: (2024)

Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
by: Gong, Ruihao, et al.
Published: (2024)

BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design
by: Ding, Yifu, et al.
Published: (2026)

An Empirical Study of Qwen3 Quantization
by: Zheng, Xingyu, et al.
Published: (2025)

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
by: Zheng, Xingyu, et al.
Published: (2024)

An empirical study of LLaMA3 quantization: from LLMs to MLLMs
by: Huang, Wei, et al.
Published: (2024)

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
by: Ding, Yifu, et al.
Published: (2026)

HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
by: Huang, Yushi, et al.
Published: (2024)

BiDM: Pushing the Limit of Quantization for Diffusion Models
by: Zheng, Xingyu, et al.
Published: (2024)

Post-Training Quantization for Video Matting
by: Zhu, Tianrui, et al.
Published: (2025)

LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
by: Lv, Chengtao, et al.
Published: (2025)

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
by: Gong, Ruihao, et al.
Published: (2024)

Low-bit Model Quantization for Deep Neural Networks: A Survey
by: Liu, Kai, et al.
Published: (2025)

Event-Priori-Based Vision-Language Model for Efficient Visual Understanding
by: Qin, Haotong, et al.
Published: (2025)

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
by: Qin, Haotong, et al.
Published: (2024)

QVD: Post-training Quantization for Video Diffusion Models
by: Tian, Shilong, et al.
Published: (2024)

Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
by: Zhang, Tianao, et al.
Published: (2025)

Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
by: Baumann, Nicolas, et al.
Published: (2025)

PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications
by: Bonazzi, Pietro, et al.
Published: (2025)

LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation
by: Huang, Yushi, et al.
Published: (2025)

BiVM: Accurate Binarized Neural Network for Efficient Video Matting
by: Qin, Haotong, et al.
Published: (2025)

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
by: Huang, Wei, et al.
Published: (2024)

SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
by: Wang, Jiacheng, et al.
Published: (2025)

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
by: Huang, Wei, et al.
Published: (2024)

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
by: Lv, Chengtao, et al.
Published: (2026)

SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models
by: Ying, Zonghao, et al.
Published: (2024)

Nonsmooth Nonconvex-Concave Minimax Optimization: Convergence Criteria and Algorithms
by: Shi, Jinyang, et al.
Published: (2026)

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution
by: Liu, Kai, et al.
Published: (2024)

Dynamic Parallel Tree Search for Efficient LLM Reasoning
by: Ding, Yifu, et al.
Published: (2025)

Stateful Large Language Model Serving with Pensieve
by: Yu, Lingfan, et al.
Published: (2023)

PicoSAM3: Real-Time In-Sensor Region-of-Interest Segmentation
by: Bonazzi, Pietro, et al.
Published: (2026)

Regularity of viscosity solutions of the $σ_k$-Yamabe-type Problem for $k>n/2$
by: Wu, Jinyang
Published: (2024)