Saved in:
| Main Authors: | Gong, Ruihao, Ding, Yifu, Wang, Zining, Lv, Chengtao, Zheng, Xingyu, Du, Jinyang, Qin, Haotong, Guo, Jinyang, Magno, Michele, Liu, Xianglong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.16694 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PTQ4SAM: Post-Training Quantization for Segment Anything
by: Lv, Chengtao, et al.
Published: (2024)
by: Lv, Chengtao, et al.
Published: (2024)
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
by: Huang, Yushi, et al.
Published: (2025)
by: Huang, Yushi, et al.
Published: (2025)
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
by: Zheng, Xingyu, et al.
Published: (2025)
by: Zheng, Xingyu, et al.
Published: (2025)
QVGen: Pushing the Limit of Quantized Video Generative Models
by: Huang, Yushi, et al.
Published: (2025)
by: Huang, Yushi, et al.
Published: (2025)
DB-LLM: Accurate Dual-Binarization for Efficient LLMs
by: Chen, Hong, et al.
Published: (2024)
by: Chen, Hong, et al.
Published: (2024)
QuantSR+: Pushing the Limit of Quantized Image Super-Resolution Networks
by: Qin, Haotong, et al.
Published: (2026)
by: Qin, Haotong, et al.
Published: (2026)
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
by: Wnag, Zining, et al.
Published: (2024)
by: Wnag, Zining, et al.
Published: (2024)
Collaborative Few-Step Distillation and Low-Bit Quantization for Wan2.2 Dual-Expert Video Diffusion Models
by: Du, Jinyang, et al.
Published: (2026)
by: Du, Jinyang, et al.
Published: (2026)
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
by: Yang, Ge, et al.
Published: (2024)
by: Yang, Ge, et al.
Published: (2024)
Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes
by: Gong, Ruihao, et al.
Published: (2024)
by: Gong, Ruihao, et al.
Published: (2024)
BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design
by: Ding, Yifu, et al.
Published: (2026)
by: Ding, Yifu, et al.
Published: (2026)
An Empirical Study of Qwen3 Quantization
by: Zheng, Xingyu, et al.
Published: (2025)
by: Zheng, Xingyu, et al.
Published: (2025)
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
by: Zheng, Xingyu, et al.
Published: (2024)
by: Zheng, Xingyu, et al.
Published: (2024)
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
by: Huang, Wei, et al.
Published: (2024)
by: Huang, Wei, et al.
Published: (2024)
Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
by: Ding, Yifu, et al.
Published: (2026)
by: Ding, Yifu, et al.
Published: (2026)
HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
by: Huang, Yushi, et al.
Published: (2024)
by: Huang, Yushi, et al.
Published: (2024)
BiDM: Pushing the Limit of Quantization for Diffusion Models
by: Zheng, Xingyu, et al.
Published: (2024)
by: Zheng, Xingyu, et al.
Published: (2024)
Post-Training Quantization for Video Matting
by: Zhu, Tianrui, et al.
Published: (2025)
by: Zhu, Tianrui, et al.
Published: (2025)
LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit
by: Lv, Chengtao, et al.
Published: (2025)
by: Lv, Chengtao, et al.
Published: (2025)
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
by: Gong, Ruihao, et al.
Published: (2024)
by: Gong, Ruihao, et al.
Published: (2024)
Low-bit Model Quantization for Deep Neural Networks: A Survey
by: Liu, Kai, et al.
Published: (2025)
by: Liu, Kai, et al.
Published: (2025)
Event-Priori-Based Vision-Language Model for Efficient Visual Understanding
by: Qin, Haotong, et al.
Published: (2025)
by: Qin, Haotong, et al.
Published: (2025)
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
by: Qin, Haotong, et al.
Published: (2024)
by: Qin, Haotong, et al.
Published: (2024)
QVD: Post-training Quantization for Video Diffusion Models
by: Tian, Shilong, et al.
Published: (2024)
by: Tian, Shilong, et al.
Published: (2024)
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
by: Zhang, Tianao, et al.
Published: (2025)
by: Zhang, Tianao, et al.
Published: (2025)
Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models
by: Baumann, Nicolas, et al.
Published: (2025)
by: Baumann, Nicolas, et al.
Published: (2025)
PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications
by: Bonazzi, Pietro, et al.
Published: (2025)
by: Bonazzi, Pietro, et al.
Published: (2025)
LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation
by: Huang, Yushi, et al.
Published: (2025)
by: Huang, Yushi, et al.
Published: (2025)
BiVM: Accurate Binarized Neural Network for Efficient Video Matting
by: Qin, Haotong, et al.
Published: (2025)
by: Qin, Haotong, et al.
Published: (2025)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
by: Huang, Wei, et al.
Published: (2024)
by: Huang, Wei, et al.
Published: (2024)
SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
by: Wang, Jiacheng, et al.
Published: (2025)
by: Wang, Jiacheng, et al.
Published: (2025)
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
by: Huang, Wei, et al.
Published: (2024)
by: Huang, Wei, et al.
Published: (2024)
Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
by: Lv, Chengtao, et al.
Published: (2026)
by: Lv, Chengtao, et al.
Published: (2026)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models
by: Ying, Zonghao, et al.
Published: (2024)
by: Ying, Zonghao, et al.
Published: (2024)
Nonsmooth Nonconvex-Concave Minimax Optimization: Convergence Criteria and Algorithms
by: Shi, Jinyang, et al.
Published: (2026)
by: Shi, Jinyang, et al.
Published: (2026)
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution
by: Liu, Kai, et al.
Published: (2024)
by: Liu, Kai, et al.
Published: (2024)
Dynamic Parallel Tree Search for Efficient LLM Reasoning
by: Ding, Yifu, et al.
Published: (2025)
by: Ding, Yifu, et al.
Published: (2025)
Stateful Large Language Model Serving with Pensieve
by: Yu, Lingfan, et al.
Published: (2023)
by: Yu, Lingfan, et al.
Published: (2023)
PicoSAM3: Real-Time In-Sensor Region-of-Interest Segmentation
by: Bonazzi, Pietro, et al.
Published: (2026)
by: Bonazzi, Pietro, et al.
Published: (2026)
Regularity of viscosity solutions of the $σ_k$-Yamabe-type Problem for $k>n/2$
by: Wu, Jinyang
Published: (2024)
by: Wu, Jinyang
Published: (2024)
Similar Items
-
PTQ4SAM: Post-Training Quantization for Segment Anything
by: Lv, Chengtao, et al.
Published: (2024) -
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
by: Huang, Yushi, et al.
Published: (2025) -
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models
by: Zheng, Xingyu, et al.
Published: (2025) -
QVGen: Pushing the Limit of Quantized Video Generative Models
by: Huang, Yushi, et al.
Published: (2025) -
DB-LLM: Accurate Dual-Binarization for Efficient LLMs
by: Chen, Hong, et al.
Published: (2024)