:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, JiangYong, Zhou, Sifan, Yang, Dawei, Wang, Shuo, Li, Shuoyu, Hu, Xing, Xu, Chen, Xu, Zukang, Shu, Changyong, Yuan, Zhihang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.00425
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
by: Hu, Xing, et al.
Published: (2025)

MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
by: Hu, Xing, et al.
Published: (2025)

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
by: Xu, Chen, et al.
Published: (2025)

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024)

RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
by: Xu, Zukang, et al.
Published: (2025)

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
by: Xu, Zukang, et al.
Published: (2025)

PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling
by: Yue, Yuxuan, et al.
Published: (2025)

TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026)

KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)

MGVQ: Synergizing Multi-dimensional Sensitivity-Aware and Gradient-Hessian Fusion for Vector Quantization
by: Wang, Zhong, et al.
Published: (2026)

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
by: Zhao, Zhixiong, et al.
Published: (2026)

FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)

FQ-PETR: Fully Quantized Position Embedding Transformation for Multi-View 3D Object Detection
by: Yu, Jiangyong, et al.
Published: (2025)

BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs
by: Zhao, Zhixiong, et al.
Published: (2026)

GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
by: Zhou, Sifan, et al.
Published: (2025)

SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
by: Hu, Xing, et al.
Published: (2026)

Information Entropy Guided Height-aware Histogram for Quantization-friendly Pillar Feature Encoder
by: Zhou, Sifan, et al.
Published: (2024)

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More
by: Yue, Yuxuan, et al.
Published: (2024)

MVCTrack: Boosting 3D Point Cloud Tracking via Multimodal-Guided Virtual Cues
by: Hu, Zhaofeng, et al.
Published: (2024)

DLLMQuant: Quantizing Diffusion-based Large Language Models
by: Xu, Chen, et al.
Published: (2025)

PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds
by: Xu, Weisheng, et al.
Published: (2024)

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
by: Li, Shuaiting, et al.
Published: (2025)

GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting
by: Sun, Qianpu, et al.
Published: (2024)

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
by: Lin, Zhihang, et al.
Published: (2024)

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
by: Zeng, Chao, et al.
Published: (2024)

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
by: Hu, Lulu, et al.
Published: (2026)

Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
by: Duanmu, Haojie, et al.
Published: (2024)

MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)

Divide-and-Conquer Inference for Large-Scale Visual Recognition with Multimodal Large Language Models
by: Ye, Zhipeng, et al.
Published: (2026)

NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
by: Yu, Jiangyong, et al.
Published: (2026)

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
by: Zeng, Chao, et al.
Published: (2024)

6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models
by: Su, Rundong, et al.
Published: (2026)

Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models
by: Seo, Hyunjin, et al.
Published: (2024)

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
by: Yuan, Zhihang, et al.
Published: (2023)

Speculative Decoding Reimagined for Multimodal Large Language Models
by: Lin, Luxi, et al.
Published: (2025)

InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
by: Wen, Liangjian, et al.
Published: (2025)

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
by: Zhao, Shitian, et al.
Published: (2024)

Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
by: Xie, Xing, et al.
Published: (2025)

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
by: Zhang, Jinghe, et al.
Published: (2026)