Saved in:
| Main Authors: | Yu, JiangYong, Zhou, Sifan, Yang, Dawei, Wang, Shuo, Li, Shuoyu, Hu, Xing, Xu, Chen, Xu, Zukang, Shu, Changyong, Yuan, Zhihang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.00425 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
by: Hu, Xing, et al.
Published: (2025)
by: Hu, Xing, et al.
Published: (2025)
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025)
by: Hu, Xing, et al.
Published: (2025)
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
by: Xu, Chen, et al.
Published: (2025)
by: Xu, Chen, et al.
Published: (2025)
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024)
by: Hu, Xing, et al.
Published: (2024)
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
by: Xu, Zukang, et al.
Published: (2025)
by: Xu, Zukang, et al.
Published: (2025)
MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
by: Xu, Zukang, et al.
Published: (2025)
by: Xu, Zukang, et al.
Published: (2025)
PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling
by: Yue, Yuxuan, et al.
Published: (2025)
by: Yue, Yuxuan, et al.
Published: (2025)
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
MGVQ: Synergizing Multi-dimensional Sensitivity-Aware and Gradient-Hessian Fusion for Vector Quantization
by: Wang, Zhong, et al.
Published: (2026)
by: Wang, Zhong, et al.
Published: (2026)
MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
by: Zhao, Zhixiong, et al.
Published: (2026)
by: Zhao, Zhixiong, et al.
Published: (2026)
FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)
by: Yu, Jiangyong, et al.
Published: (2025)
FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)
by: Yu, Jiangyong, et al.
Published: (2025)
BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs
by: Zhao, Zhixiong, et al.
Published: (2026)
by: Zhao, Zhixiong, et al.
Published: (2026)
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
by: Zhou, Sifan, et al.
Published: (2025)
by: Zhou, Sifan, et al.
Published: (2025)
SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
by: Hu, Xing, et al.
Published: (2026)
by: Hu, Xing, et al.
Published: (2026)
Information Entropy Guided Height-aware Histogram for Quantization-friendly Pillar Feature Encoder
by: Zhou, Sifan, et al.
Published: (2024)
by: Zhou, Sifan, et al.
Published: (2024)
WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More
by: Yue, Yuxuan, et al.
Published: (2024)
by: Yue, Yuxuan, et al.
Published: (2024)
MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues
by: Hu, Zhaofeng, et al.
Published: (2024)
by: Hu, Zhaofeng, et al.
Published: (2024)
DLLMQuant: Quantizing Diffusion-based Large Language Models
by: Xu, Chen, et al.
Published: (2025)
by: Xu, Chen, et al.
Published: (2025)
PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds
by: Xu, Weisheng, et al.
Published: (2024)
by: Xu, Weisheng, et al.
Published: (2024)
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
by: Li, Shuaiting, et al.
Published: (2025)
by: Li, Shuaiting, et al.
Published: (2025)
GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting
by: Sun, Qianpu, et al.
Published: (2024)
by: Sun, Qianpu, et al.
Published: (2024)
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
by: Lin, Zhihang, et al.
Published: (2024)
by: Lin, Zhihang, et al.
Published: (2024)
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
by: Zeng, Chao, et al.
Published: (2024)
by: Zeng, Chao, et al.
Published: (2024)
MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
by: Hu, Lulu, et al.
Published: (2026)
by: Hu, Lulu, et al.
Published: (2026)
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)
by: Li, Hengzhuang, et al.
Published: (2025)
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
by: Duanmu, Haojie, et al.
Published: (2024)
by: Duanmu, Haojie, et al.
Published: (2024)
MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)
by: Tang, Xinru, et al.
Published: (2025)
Divide-and-Conquer Inference for Large-Scale Visual Recognition with Multimodal Large Language Models
by: Ye, Zhipeng, et al.
Published: (2026)
by: Ye, Zhipeng, et al.
Published: (2026)
NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
by: Yu, Jiangyong, et al.
Published: (2026)
by: Yu, Jiangyong, et al.
Published: (2026)
ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
by: Zeng, Chao, et al.
Published: (2024)
by: Zeng, Chao, et al.
Published: (2024)
6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models
by: Su, Rundong, et al.
Published: (2026)
by: Su, Rundong, et al.
Published: (2026)
Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models
by: Seo, Hyunjin, et al.
Published: (2024)
by: Seo, Hyunjin, et al.
Published: (2024)
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
by: Yuan, Zhihang, et al.
Published: (2023)
by: Yuan, Zhihang, et al.
Published: (2023)
Speculative Decoding Reimagined for Multimodal Large Language Models
by: Lin, Luxi, et al.
Published: (2025)
by: Lin, Luxi, et al.
Published: (2025)
InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
by: Wen, Liangjian, et al.
Published: (2025)
by: Wen, Liangjian, et al.
Published: (2025)
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
by: Zhao, Shitian, et al.
Published: (2024)
by: Zhao, Shitian, et al.
Published: (2024)
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
by: Xie, Xing, et al.
Published: (2025)
by: Xie, Xing, et al.
Published: (2025)
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
by: Zhang, Jinghe, et al.
Published: (2026)
by: Zhang, Jinghe, et al.
Published: (2026)
Similar Items
-
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
by: Hu, Xing, et al.
Published: (2025) -
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025) -
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
by: Xu, Chen, et al.
Published: (2025) -
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024) -
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
by: Xu, Zukang, et al.
Published: (2025)