Saved in:
| Main Authors: | Li, Xinlin, Chou, Timothy, Fromm, Josh, Liu, Zichang, Pan, Yunjie, Fragouli, Christina |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.17698 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ICQuant: Index Coding enables Low-bit LLM Quantization
by: Li, Xinlin, et al.
Published: (2025)
by: Li, Xinlin, et al.
Published: (2025)
Bitwidth-Specific Logarithmic Arithmetic for Future Hardware-Accelerated Training
by: Hamad, Hassan, et al.
Published: (2025)
by: Hamad, Hassan, et al.
Published: (2025)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
by: Huang, Wei, et al.
Published: (2023)
by: Huang, Wei, et al.
Published: (2023)
StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths
by: Chen, Tianyi, et al.
Published: (2026)
by: Chen, Tianyi, et al.
Published: (2026)
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
by: Liu, Jing, et al.
Published: (2023)
by: Liu, Jing, et al.
Published: (2023)
Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies
by: Zhou, Changhai, et al.
Published: (2025)
by: Zhou, Changhai, et al.
Published: (2025)
Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs
by: Miyamoto, Sora, et al.
Published: (2026)
by: Miyamoto, Sora, et al.
Published: (2026)
PQS (Prune, Quantize, and Sort): Low-Bitwidth Accumulation of Dot Products in Neural Network Computations
by: Natesh, Vikas, et al.
Published: (2025)
by: Natesh, Vikas, et al.
Published: (2025)
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
by: Park, Gunho, et al.
Published: (2025)
by: Park, Gunho, et al.
Published: (2025)
APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs
by: Bouzouad, Meriem, et al.
Published: (2026)
by: Bouzouad, Meriem, et al.
Published: (2026)
Quantifying the Capabilities of LLMs across Scale and Precision
by: Badshah, Sher, et al.
Published: (2024)
by: Badshah, Sher, et al.
Published: (2024)
Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning
by: Ugadiarov, Leonid, et al.
Published: (2026)
by: Ugadiarov, Leonid, et al.
Published: (2026)
Aligning CodeLLMs with Direct Preference Optimization
by: Miao, Yibo, et al.
Published: (2024)
by: Miao, Yibo, et al.
Published: (2024)
Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training
by: Varshney, Ayush K., et al.
Published: (2026)
by: Varshney, Ayush K., et al.
Published: (2026)
Multi-Objective Hardware Aware Neural Architecture Search using Hardware Cost Diversity
by: Sinha, Nilotpal, et al.
Published: (2024)
by: Sinha, Nilotpal, et al.
Published: (2024)
Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers
by: Chakrabarti, Sayak, et al.
Published: (2026)
by: Chakrabarti, Sayak, et al.
Published: (2026)
AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs
by: Kang, Feiyang, et al.
Published: (2024)
by: Kang, Feiyang, et al.
Published: (2024)
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
by: Liu, Wenyuan, et al.
Published: (2025)
by: Liu, Wenyuan, et al.
Published: (2025)
Enhancing Binary Search via Overlapping Partitions
by: Buyukkalayci, Kaan, et al.
Published: (2025)
by: Buyukkalayci, Kaan, et al.
Published: (2025)
On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
by: Ge, Zichang, et al.
Published: (2025)
by: Ge, Zichang, et al.
Published: (2025)
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
by: Federici, Marco, et al.
Published: (2025)
by: Federici, Marco, et al.
Published: (2025)
Mixed-Precision Federated Learning via Multi-Precision Over-The-Air Aggregation
by: Yuan, Jinsheng, et al.
Published: (2024)
by: Yuan, Jinsheng, et al.
Published: (2024)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)
by: Yuan, Jingyang, et al.
Published: (2025)
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
Divide-Verify-Refine: Can LLMs Self-Align with Complex Instructions?
by: Zhang, Xianren, et al.
Published: (2024)
by: Zhang, Xianren, et al.
Published: (2024)
Scalable Generative Game Engine: Breaking the Resolution Wall via Hardware-Algorithm Co-Design
by: Zeng, Wei, et al.
Published: (2026)
by: Zeng, Wei, et al.
Published: (2026)
A Unified Framework for Generative Data Augmentation: A Comprehensive Survey
by: Chen, Yunhao, et al.
Published: (2023)
by: Chen, Yunhao, et al.
Published: (2023)
A Metric Driven Approach to Mixed Precision Training
by: Rasquinha, Mitchelle, et al.
Published: (2024)
by: Rasquinha, Mitchelle, et al.
Published: (2024)
Mixed-Precision Quantization for Language Models: Techniques and Prospects
by: Rakka, Mariam, et al.
Published: (2025)
by: Rakka, Mariam, et al.
Published: (2025)
MoR: Mixture Of Representations For Mixed-Precision Training
by: Su, Bor-Yiing, et al.
Published: (2025)
by: Su, Bor-Yiing, et al.
Published: (2025)
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
by: Zhou, Zhanhui, et al.
Published: (2024)
by: Zhou, Zhanhui, et al.
Published: (2024)
HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs
by: Nazari, Samira, et al.
Published: (2026)
by: Nazari, Samira, et al.
Published: (2026)
Comprehensive Description of Uncertainty in Measurement for Representation and Propagation with Scalable Precision
by: Darijani, Ali, et al.
Published: (2026)
by: Darijani, Ali, et al.
Published: (2026)
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
by: Gong, Ping, et al.
Published: (2025)
by: Gong, Ping, et al.
Published: (2025)
MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
Scalable Meta-Learning via Mixed-Mode Differentiation
by: Kemaev, Iurii, et al.
Published: (2025)
by: Kemaev, Iurii, et al.
Published: (2025)
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
by: Tang, Zhenchao, et al.
Published: (2025)
by: Tang, Zhenchao, et al.
Published: (2025)
The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation
by: Liu, Mingyi
Published: (2026)
by: Liu, Mingyi
Published: (2026)
Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026)
by: Zhou, Enyu, et al.
Published: (2026)
Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI
by: Thind, Parampuneet Kaur, et al.
Published: (2026)
by: Thind, Parampuneet Kaur, et al.
Published: (2026)
Similar Items
-
ICQuant: Index Coding enables Low-bit LLM Quantization
by: Li, Xinlin, et al.
Published: (2025) -
Bitwidth-Specific Logarithmic Arithmetic for Future Hardware-Accelerated Training
by: Hamad, Hassan, et al.
Published: (2025) -
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
by: Huang, Wei, et al.
Published: (2023) -
StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths
by: Chen, Tianyi, et al.
Published: (2026) -
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
by: Liu, Jing, et al.
Published: (2023)