:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Yizhi, Hemani, Ahmed
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Hardware Architecture
Online Access:	https://arxiv.org/abs/2601.09451
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators
by: Han, Ruichi, et al.
Published: (2026)

AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
by: Matsushima, Kosuke, et al.
Published: (2026)

Improving Quantization with Post-Training Model Expansion
by: Franco, Giuseppe, et al.
Published: (2025)

HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
by: Feng, Yuannuo, et al.
Published: (2025)

FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
by: Qiao, Ye, et al.
Published: (2026)

AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs
by: Ahmed, Md Rubel, et al.
Published: (2024)

MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization
by: Ramachandran, Akshat, et al.
Published: (2024)

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)

SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization
by: Xie, Rui, et al.
Published: (2024)

Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers
by: Lu, Jinming, et al.
Published: (2025)

On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
by: Huang, Wei, et al.
Published: (2023)

MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)

Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications
by: Abushahla, Hamza A., et al.
Published: (2025)

RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks
by: Mohammadi, Ali Soltan, et al.
Published: (2026)

Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators
by: Kim, Jiyoon, et al.
Published: (2025)

Late Breaking Results: Breaking Symmetry- Unconventional Placement of Analog Circuits using Multi-Level Multi-Agent Reinforcement Learning
by: Maji, Supriyo, et al.
Published: (2025)

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
by: Li, Bingbing, et al.
Published: (2024)

Characterizing State Space Model and Hybrid Language Model Performance with Long Context
by: Mitra, Saptarshi, et al.
Published: (2025)

Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns
by: Bambhaniya, Abhimanyu, et al.
Published: (2026)

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
by: Fang, Chao, et al.
Published: (2024)

AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations
by: Seo, Jamin, et al.
Published: (2025)

Multi-objective Optimization in CPU Design Space Exploration: Attention is All You Need
by: Xue, Runzhen, et al.
Published: (2024)

GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units
by: Bouvier, Maxence, et al.
Published: (2025)

LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation
by: Chang, Chen-Chia, et al.
Published: (2025)

LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits
by: Chang, Chen-Chia, et al.
Published: (2024)

MonoSparse-CAM: Efficient Tree Model Processing via Monotonicity and Sparsity in CAMs
by: Molom-Ochir, Tergel, et al.
Published: (2024)

HiFloat4 Format for Language Model Inference
by: Luo, Yuanyong, et al.
Published: (2026)

Chip Placement with Diffusion Models
by: Lee, Vint, et al.
Published: (2024)

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)

Challenges and Research Directions for Large Language Model Inference Hardware
by: Ma, Xiaoyu, et al.
Published: (2026)

eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
by: Agrawal, Aditya, et al.
Published: (2024)

PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models
by: Lee, Yunjae, et al.
Published: (2024)

EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
by: Heo, Jaehoon, et al.
Published: (2025)

AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing
by: Ahmadzadeh, Mohsen, et al.
Published: (2025)

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
by: Ma, Shaobo, et al.
Published: (2024)

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
by: Kim, Taehyun, et al.
Published: (2024)

Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference
by: Yadav, Divakar Kumar, et al.
Published: (2026)

ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model
by: Xu, Ning, et al.
Published: (2024)

ML For Hardware Design Interpretability: Challenges and Opportunities
by: Baartmans, Raymond, et al.
Published: (2025)

rule4ml: An Open-Source Tool for Resource Utilization and Latency Estimation for ML Models on FPGA
by: Rahimifar, Mohammad Mehdi, et al.
Published: (2024)