Saved in:
| Main Authors: | Chen, Yizhi, Hemani, Ahmed |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.09451 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators
by: Han, Ruichi, et al.
Published: (2026)
by: Han, Ruichi, et al.
Published: (2026)
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
by: Matsushima, Kosuke, et al.
Published: (2026)
by: Matsushima, Kosuke, et al.
Published: (2026)
Improving Quantization with Post-Training Model Expansion
by: Franco, Giuseppe, et al.
Published: (2025)
by: Franco, Giuseppe, et al.
Published: (2025)
HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
by: Feng, Yuannuo, et al.
Published: (2025)
by: Feng, Yuannuo, et al.
Published: (2025)
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
by: Qiao, Ye, et al.
Published: (2026)
by: Qiao, Ye, et al.
Published: (2026)
AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs
by: Ahmed, Md Rubel, et al.
Published: (2024)
by: Ahmed, Md Rubel, et al.
Published: (2024)
MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization
by: Ramachandran, Akshat, et al.
Published: (2024)
by: Ramachandran, Akshat, et al.
Published: (2024)
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)
by: Chhugani, Jatin, et al.
Published: (2026)
SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization
by: Xie, Rui, et al.
Published: (2024)
by: Xie, Rui, et al.
Published: (2024)
Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers
by: Lu, Jinming, et al.
Published: (2025)
by: Lu, Jinming, et al.
Published: (2025)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
by: Huang, Wei, et al.
Published: (2023)
by: Huang, Wei, et al.
Published: (2023)
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications
by: Abushahla, Hamza A., et al.
Published: (2025)
by: Abushahla, Hamza A., et al.
Published: (2025)
RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks
by: Mohammadi, Ali Soltan, et al.
Published: (2026)
by: Mohammadi, Ali Soltan, et al.
Published: (2026)
Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators
by: Kim, Jiyoon, et al.
Published: (2025)
by: Kim, Jiyoon, et al.
Published: (2025)
Late Breaking Results: Breaking Symmetry- Unconventional Placement of Analog Circuits using Multi-Level Multi-Agent Reinforcement Learning
by: Maji, Supriyo, et al.
Published: (2025)
by: Maji, Supriyo, et al.
Published: (2025)
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
by: Li, Bingbing, et al.
Published: (2024)
by: Li, Bingbing, et al.
Published: (2024)
Characterizing State Space Model and Hybrid Language Model Performance with Long Context
by: Mitra, Saptarshi, et al.
Published: (2025)
by: Mitra, Saptarshi, et al.
Published: (2025)
Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns
by: Bambhaniya, Abhimanyu, et al.
Published: (2026)
by: Bambhaniya, Abhimanyu, et al.
Published: (2026)
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
by: Fang, Chao, et al.
Published: (2024)
by: Fang, Chao, et al.
Published: (2024)
AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations
by: Seo, Jamin, et al.
Published: (2025)
by: Seo, Jamin, et al.
Published: (2025)
Multi-objective Optimization in CPU Design Space Exploration: Attention is All You Need
by: Xue, Runzhen, et al.
Published: (2024)
by: Xue, Runzhen, et al.
Published: (2024)
GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units
by: Bouvier, Maxence, et al.
Published: (2025)
by: Bouvier, Maxence, et al.
Published: (2025)
LaMAGIC2: Advanced Circuit Formulations for Language Model-Based Analog Topology Generation
by: Chang, Chen-Chia, et al.
Published: (2025)
by: Chang, Chen-Chia, et al.
Published: (2025)
LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits
by: Chang, Chen-Chia, et al.
Published: (2024)
by: Chang, Chen-Chia, et al.
Published: (2024)
MonoSparse-CAM: Efficient Tree Model Processing via Monotonicity and Sparsity in CAMs
by: Molom-Ochir, Tergel, et al.
Published: (2024)
by: Molom-Ochir, Tergel, et al.
Published: (2024)
HiFloat4 Format for Language Model Inference
by: Luo, Yuanyong, et al.
Published: (2026)
by: Luo, Yuanyong, et al.
Published: (2026)
Chip Placement with Diffusion Models
by: Lee, Vint, et al.
Published: (2024)
by: Lee, Vint, et al.
Published: (2024)
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)
by: Xia, Haojun, et al.
Published: (2024)
Challenges and Research Directions for Large Language Model Inference Hardware
by: Ma, Xiaoyu, et al.
Published: (2026)
by: Ma, Xiaoyu, et al.
Published: (2026)
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
by: Agrawal, Aditya, et al.
Published: (2024)
by: Agrawal, Aditya, et al.
Published: (2024)
PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models
by: Lee, Yunjae, et al.
Published: (2024)
by: Lee, Yunjae, et al.
Published: (2024)
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
by: Heo, Jaehoon, et al.
Published: (2025)
by: Heo, Jaehoon, et al.
Published: (2025)
AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing
by: Ahmadzadeh, Mohsen, et al.
Published: (2025)
by: Ahmadzadeh, Mohsen, et al.
Published: (2025)
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
by: Ma, Shaobo, et al.
Published: (2024)
by: Ma, Shaobo, et al.
Published: (2024)
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
by: Kim, Taehyun, et al.
Published: (2024)
by: Kim, Taehyun, et al.
Published: (2024)
Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference
by: Yadav, Divakar Kumar, et al.
Published: (2026)
by: Yadav, Divakar Kumar, et al.
Published: (2026)
ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model
by: Xu, Ning, et al.
Published: (2024)
by: Xu, Ning, et al.
Published: (2024)
ML For Hardware Design Interpretability: Challenges and Opportunities
by: Baartmans, Raymond, et al.
Published: (2025)
by: Baartmans, Raymond, et al.
Published: (2025)
rule4ml: An Open-Source Tool for Resource Utilization and Latency Estimation for ML Models on FPGA
by: Rahimifar, Mohammad Mehdi, et al.
Published: (2024)
by: Rahimifar, Mohammad Mehdi, et al.
Published: (2024)
Similar Items
-
'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators
by: Han, Ruichi, et al.
Published: (2026) -
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
by: Matsushima, Kosuke, et al.
Published: (2026) -
Improving Quantization with Post-Training Model Expansion
by: Franco, Giuseppe, et al.
Published: (2025) -
HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
by: Feng, Yuannuo, et al.
Published: (2025) -
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
by: Qiao, Ye, et al.
Published: (2026)