:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Xinlin, Chou, Timothy, Fromm, Josh, Liu, Zichang, Pan, Yunjie, Fragouli, Christina
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.17698
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ICQuant: Index Coding enables Low-bit LLM Quantization
by: Li, Xinlin, et al.
Published: (2025)

Bitwidth-Specific Logarithmic Arithmetic for Future Hardware-Accelerated Training
by: Hamad, Hassan, et al.
Published: (2025)

On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
by: Huang, Wei, et al.
Published: (2023)

StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths
by: Chen, Tianyi, et al.
Published: (2026)

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
by: Liu, Jing, et al.
Published: (2023)

Balancing Fidelity and Plasticity: Aligning Mixed-Precision Fine-Tuning with Linguistic Hierarchies
by: Zhou, Changhai, et al.
Published: (2025)

Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs
by: Miyamoto, Sora, et al.
Published: (2026)

PQS (Prune, Quantize, and Sort): Low-Bitwidth Accumulation of Dot Products in Neural Network Computations
by: Natesh, Vikas, et al.
Published: (2025)

AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
by: Park, Gunho, et al.
Published: (2025)

APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs
by: Bouzouad, Meriem, et al.
Published: (2026)

Quantifying the Capabilities of LLMs across Scale and Precision
by: Badshah, Sher, et al.
Published: (2024)

Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning
by: Ugadiarov, Leonid, et al.
Published: (2026)

Aligning CodeLLMs with Direct Preference Optimization
by: Miao, Yibo, et al.
Published: (2024)

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training
by: Varshney, Ayush K., et al.
Published: (2026)

Multi-Objective Hardware Aware Neural Architecture Search using Hardware Cost Diversity
by: Sinha, Nilotpal, et al.
Published: (2024)

Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers
by: Chakrabarti, Sayak, et al.
Published: (2026)

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs
by: Kang, Feiyang, et al.
Published: (2024)

MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
by: Liu, Wenyuan, et al.
Published: (2025)

Enhancing Binary Search via Overlapping Partitions
by: Buyukkalayci, Kaan, et al.
Published: (2025)

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
by: Ge, Zichang, et al.
Published: (2025)

STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
by: Federici, Marco, et al.
Published: (2025)

Mixed-Precision Federated Learning via Multi-Precision Over-The-Air Aggregation
by: Yuan, Jinsheng, et al.
Published: (2024)

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)

MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)

Divide-Verify-Refine: Can LLMs Self-Align with Complex Instructions?
by: Zhang, Xianren, et al.
Published: (2024)

Scalable Generative Game Engine: Breaking the Resolution Wall via Hardware-Algorithm Co-Design
by: Zeng, Wei, et al.
Published: (2026)

A Unified Framework for Generative Data Augmentation: A Comprehensive Survey
by: Chen, Yunhao, et al.
Published: (2023)

A Metric Driven Approach to Mixed Precision Training
by: Rasquinha, Mitchelle, et al.
Published: (2024)

Mixed-Precision Quantization for Language Models: Techniques and Prospects
by: Rakka, Mariam, et al.
Published: (2025)

MoR: Mixture Of Representations For Mixed-Precision Training
by: Su, Bor-Yiing, et al.
Published: (2025)

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
by: Zhou, Zhanhui, et al.
Published: (2024)

HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs
by: Nazari, Samira, et al.
Published: (2026)

Comprehensive Description of Uncertainty in Measurement for Representation and Propagation with Scalable Precision
by: Darijani, Ali, et al.
Published: (2026)

HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
by: Gong, Ping, et al.
Published: (2025)

MixKVQ: Query-Aware Mixed-Precision KV Cache Quantization for Long-Context Reasoning
by: Zhang, Tao, et al.
Published: (2025)

Scalable Meta-Learning via Mixed-Mode Differentiation
by: Kemaev, Iurii, et al.
Published: (2025)

Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
by: Tang, Zhenchao, et al.
Published: (2025)

The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation
by: Liu, Mingyi
Published: (2026)

Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026)

Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI
by: Thind, Parampuneet Kaur, et al.
Published: (2026)