Saved in:
| Main Authors: | Zhang, Feng, Liu, Yanbin, Li, Weihua, Lv, Jie, Wang, Xiaodan, Bai, Quan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.06518 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multiscale Dual-path Feature Aggregation Network for Remaining Useful Life Prediction of Lithium-Ion Batteries
by: Lv, Zihao, et al.
Published: (2025)
by: Lv, Zihao, et al.
Published: (2025)
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation
by: Chen, Han, et al.
Published: (2025)
by: Chen, Han, et al.
Published: (2025)
Depth-Structured Music Recurrence: Budgeted Recurrent Attention for Full-Piece Symbolic Music Modeling
by: Yi, Yungang, et al.
Published: (2026)
by: Yi, Yungang, et al.
Published: (2026)
From Similarity to Superiority: Channel Clustering for Time Series Forecasting
by: Chen, Jialin, et al.
Published: (2024)
by: Chen, Jialin, et al.
Published: (2024)
AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization
by: Lv, Mengtao, et al.
Published: (2025)
by: Lv, Mengtao, et al.
Published: (2025)
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
by: Duanmu, Haojie, et al.
Published: (2025)
by: Duanmu, Haojie, et al.
Published: (2025)
Optimizing Prompts for Large Language Models: A Causal Approach
by: Chen, Wei, et al.
Published: (2026)
by: Chen, Wei, et al.
Published: (2026)
Ideological Isolation in Online Social Networks: A Survey of Computational Definitions, Metrics, and Mitigation Strategies
by: Wang, Xiaodan, et al.
Published: (2026)
by: Wang, Xiaodan, et al.
Published: (2026)
CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization
by: Zhang, Jinhao, et al.
Published: (2025)
by: Zhang, Jinhao, et al.
Published: (2025)
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
by: Liang, Hao, et al.
Published: (2022)
by: Liang, Hao, et al.
Published: (2022)
Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection
by: Li, Xiaodan, et al.
Published: (2025)
by: Li, Xiaodan, et al.
Published: (2025)
Global Stress Generation and Spatiotemporal Super-Resolution Physics-Informed Operator under Dynamic Loading for Two-Phase Random Materials
by: Xing, Tengfei, et al.
Published: (2025)
by: Xing, Tengfei, et al.
Published: (2025)
Predicting Stress in Two-phase Random Materials and Super-Resolution Method for Stress Images by Embedding Physical Information
by: Xing, Tengfei, et al.
Published: (2025)
by: Xing, Tengfei, et al.
Published: (2025)
Efficient Edge LLMs Deployment via HessianAware Quantization and CPU GPU Collaborative
by: Zhang, Tuo, et al.
Published: (2025)
by: Zhang, Tuo, et al.
Published: (2025)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models
by: Wei, Quan, et al.
Published: (2025)
by: Wei, Quan, et al.
Published: (2025)
OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension
by: Zhang, Zhiyuan, et al.
Published: (2026)
by: Zhang, Zhiyuan, et al.
Published: (2026)
HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs
by: Wang, Guoan, et al.
Published: (2026)
by: Wang, Guoan, et al.
Published: (2026)
CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression
by: Liu, Wenyuan, et al.
Published: (2024)
by: Liu, Wenyuan, et al.
Published: (2024)
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations
by: Tao, Qian, et al.
Published: (2024)
by: Tao, Qian, et al.
Published: (2024)
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
by: Xiao, He, et al.
Published: (2025)
by: Xiao, He, et al.
Published: (2025)
Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
by: Chen, Kejia, et al.
Published: (2025)
by: Chen, Kejia, et al.
Published: (2025)
TempoGPT: Enhancing Time Series Reasoning via Quantizing Embedding
by: Zhang, Haochuan, et al.
Published: (2025)
by: Zhang, Haochuan, et al.
Published: (2025)
Data Distribution as a Lever for Guiding Optimizers Toward Superior Generalization in LLMs
by: Gangavarapu, Tushaar, et al.
Published: (2026)
by: Gangavarapu, Tushaar, et al.
Published: (2026)
FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization
by: Xiao, Haiyang, et al.
Published: (2026)
by: Xiao, Haiyang, et al.
Published: (2026)
PatternKV: Flattening KV Representation Expands Quantization Headroom
by: Zhang, Ji, et al.
Published: (2025)
by: Zhang, Ji, et al.
Published: (2025)
Theory-optimal Quantization Based on Flatness
by: Huang, Xiusheng, et al.
Published: (2026)
by: Huang, Xiusheng, et al.
Published: (2026)
What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study
by: Lv, Keyu, et al.
Published: (2026)
by: Lv, Keyu, et al.
Published: (2026)
SplitQuant: Layer Splitting for Low-Bit Neural Network Quantization
by: Song, Jaewoo, et al.
Published: (2025)
by: Song, Jaewoo, et al.
Published: (2025)
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
by: Kurtic, Eldar, et al.
Published: (2024)
by: Kurtic, Eldar, et al.
Published: (2024)
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
by: Chen, Mengzhao, et al.
Published: (2025)
by: Chen, Mengzhao, et al.
Published: (2025)
SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration
by: Shen, Yuanhao, et al.
Published: (2024)
by: Shen, Yuanhao, et al.
Published: (2024)
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
by: Behtash, Alireza, et al.
Published: (2025)
by: Behtash, Alireza, et al.
Published: (2025)
Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation
by: Wang, Dongjie, et al.
Published: (2025)
by: Wang, Dongjie, et al.
Published: (2025)
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
by: Sun, Wenju, et al.
Published: (2025)
by: Sun, Wenju, et al.
Published: (2025)
DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression
by: Yu, Xiaoming, et al.
Published: (2026)
by: Yu, Xiaoming, et al.
Published: (2026)
TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis
by: Bai, Sikai, et al.
Published: (2026)
by: Bai, Sikai, et al.
Published: (2026)
ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification
by: Quinlan, Paul, et al.
Published: (2026)
by: Quinlan, Paul, et al.
Published: (2026)
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
by: Cheng, Wenhua, et al.
Published: (2023)
by: Cheng, Wenhua, et al.
Published: (2023)
Enhancing Model Privacy in Federated Learning with Random Masking and Quantization
by: Xu, Zhibo, et al.
Published: (2025)
by: Xu, Zhibo, et al.
Published: (2025)
Low-bit Model Quantization for Deep Neural Networks: A Survey
by: Liu, Kai, et al.
Published: (2025)
by: Liu, Kai, et al.
Published: (2025)
Similar Items
-
Multiscale Dual-path Feature Aggregation Network for Remaining Useful Life Prediction of Lithium-Ion Batteries
by: Lv, Zihao, et al.
Published: (2025) -
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation
by: Chen, Han, et al.
Published: (2025) -
Depth-Structured Music Recurrence: Budgeted Recurrent Attention for Full-Piece Symbolic Music Modeling
by: Yi, Yungang, et al.
Published: (2026) -
From Similarity to Superiority: Channel Clustering for Time Series Forecasting
by: Chen, Jialin, et al.
Published: (2024) -
AMS-QUANT: Adaptive Mantissa Sharing for Floating-point Quantization
by: Lv, Mengtao, et al.
Published: (2025)