Saved in:
| Main Authors: | Deng, Kaiyuan, Zheng, Hangyu, Qing, Minghai, Zhu, Kunxiong, Li, Gen, Xiao, Yang, Zhang, Lan Emily, Guo, Linke, Hui, Bo, Wang, Yanzhi, Yuan, Geng, Agrawal, Gagan, Niu, Wei, Ma, Xiaolong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03484 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking
by: Deng, Kaiyuan, et al.
Published: (2026)
by: Deng, Kaiyuan, et al.
Published: (2026)
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026)
by: Shu, Zhihao, et al.
Published: (2026)
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization
by: Li, Gen, et al.
Published: (2025)
by: Li, Gen, et al.
Published: (2025)
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
by: Xiao, Yang, et al.
Published: (2025)
by: Xiao, Yang, et al.
Published: (2025)
ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training
by: Ji, Jie, et al.
Published: (2026)
by: Ji, Jie, et al.
Published: (2026)
Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
by: Deng, Kaiyuan, et al.
Published: (2026)
by: Deng, Kaiyuan, et al.
Published: (2026)
The Uniqueness of LLaMA3-70B Series with Per-Channel Quantization
by: Qin, Minghai
Published: (2024)
by: Qin, Minghai
Published: (2024)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
by: Huang, Wei, et al.
Published: (2023)
by: Huang, Wei, et al.
Published: (2023)
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
by: Li, Gen, et al.
Published: (2024)
by: Li, Gen, et al.
Published: (2024)
SoD$^2$: Statically Optimizing Dynamic Deep Neural Network
by: Niu, Wei, et al.
Published: (2024)
by: Niu, Wei, et al.
Published: (2024)
Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment
by: Tan, Qitao, et al.
Published: (2026)
by: Tan, Qitao, et al.
Published: (2026)
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
by: Zhan, Zheng, et al.
Published: (2024)
by: Zhan, Zheng, et al.
Published: (2024)
Gaussians on a Diet: High-Quality Memory-Bounded 3D Gaussian Splatting Training
by: Zhang, Yangming, et al.
Published: (2026)
by: Zhang, Yangming, et al.
Published: (2026)
CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs
by: Li, Shiyang, et al.
Published: (2026)
by: Li, Shiyang, et al.
Published: (2026)
Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks
by: Niu, Muqun, et al.
Published: (2024)
by: Niu, Muqun, et al.
Published: (2024)
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
by: Lee, Deokjae, et al.
Published: (2025)
by: Lee, Deokjae, et al.
Published: (2025)
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
by: Li, Zhengang, et al.
Published: (2024)
by: Li, Zhengang, et al.
Published: (2024)
Predictability‐Aware Subsequence Modeling for Sequential Recommendation
by: Hangyu Deng, et al.
Published: (2024)
by: Hangyu Deng, et al.
Published: (2024)
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)
by: Xie, Xilong, et al.
Published: (2025)
HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models
by: Yan, Xin, et al.
Published: (2026)
by: Yan, Xin, et al.
Published: (2026)
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
by: Agrawal, Aditya, et al.
Published: (2024)
by: Agrawal, Aditya, et al.
Published: (2024)
What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale
by: Yuan, Xiaoyong, et al.
Published: (2025)
by: Yuan, Xiaoyong, et al.
Published: (2025)
QS4D: Quantization‐Aware Training for Efficient Hardware Deployment of Structured State‐Space Sequential Models
by: Sebastian Siegel, et al.
Published: (2026)
by: Sebastian Siegel, et al.
Published: (2026)
End-to-End On-Device Quantization-Aware Training for LLMs at Inference Cost
by: Tan, Qitao, et al.
Published: (2025)
by: Tan, Qitao, et al.
Published: (2025)
Efficient Edge LLMs Deployment via HessianAware Quantization and CPU GPU Collaborative
by: Zhang, Tuo, et al.
Published: (2025)
by: Zhang, Tuo, et al.
Published: (2025)
Streamlined Transmission: A Semantic-Aware XR Deployment Framework Enhanced by Generative AI
by: Yang, Wanting, et al.
Published: (2024)
by: Yang, Wanting, et al.
Published: (2024)
Laconic: Streamlined Load Balancers for SmartNICs
by: Cui, Tianyi, et al.
Published: (2024)
by: Cui, Tianyi, et al.
Published: (2024)
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
by: Ouyang, Xu, et al.
Published: (2024)
by: Ouyang, Xu, et al.
Published: (2024)
HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs
by: Wang, Guoan, et al.
Published: (2026)
by: Wang, Guoan, et al.
Published: (2026)
What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study
by: Lv, Keyu, et al.
Published: (2026)
by: Lv, Keyu, et al.
Published: (2026)
Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization
by: Wang, Zhaoyang, et al.
Published: (2025)
by: Wang, Zhaoyang, et al.
Published: (2025)
AdaQAT: Adaptive Bit-Width Quantization-Aware Training
by: Gernigon, Cédric, et al.
Published: (2024)
by: Gernigon, Cédric, et al.
Published: (2024)
Attn-QAT: 4-Bit Attention With Quantization-Aware Training
by: Zhang, Peiyuan, et al.
Published: (2026)
by: Zhang, Peiyuan, et al.
Published: (2026)
JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration
by: Wang, Mingzi, et al.
Published: (2025)
by: Wang, Mingzi, et al.
Published: (2025)
Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification
by: Huang, Hong, et al.
Published: (2026)
by: Huang, Hong, et al.
Published: (2026)
Streamlining Industrial Contract Management with Retrieval-Augmented LLMs
by: Topollai, Kristi, et al.
Published: (2025)
by: Topollai, Kristi, et al.
Published: (2025)
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs
by: Chen, Ningning, et al.
Published: (2025)
by: Chen, Ningning, et al.
Published: (2025)
CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
by: Han, Insu, et al.
Published: (2025)
by: Han, Insu, et al.
Published: (2025)
BitRL: Reinforcement Learning with 1-bit Quantized Language Models for Resource-Constrained Edge Deployment
by: Sajid, Md. Ashiq Ul Islam, et al.
Published: (2026)
by: Sajid, Md. Ashiq Ul Islam, et al.
Published: (2026)
Similar Items
-
Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking
by: Deng, Kaiyuan, et al.
Published: (2026) -
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026) -
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization
by: Li, Gen, et al.
Published: (2025) -
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
by: Xiao, Yang, et al.
Published: (2025) -
ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training
by: Ji, Jie, et al.
Published: (2026)