Saved in:
| Main Authors: | Pan, Yunjie, Yang, Yongyi, Yang, Hanmei, Mahlke, Scott |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01410 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)
by: Chhugani, Jatin, et al.
Published: (2026)
MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI
by: Jiang, Zijun, et al.
Published: (2025)
by: Jiang, Zijun, et al.
Published: (2025)
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization
by: Kim, Daeun, et al.
Published: (2025)
by: Kim, Daeun, et al.
Published: (2025)
MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration
by: Armeniakos, Giorgos, et al.
Published: (2025)
by: Armeniakos, Giorgos, et al.
Published: (2025)
A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
by: Killian, Earl
Published: (2026)
by: Killian, Earl
Published: (2026)
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
by: Bang, Jehyeon, et al.
Published: (2023)
by: Bang, Jehyeon, et al.
Published: (2023)
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
by: Wolters, Christopher, et al.
Published: (2024)
by: Wolters, Christopher, et al.
Published: (2024)
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
by: Hooper, Coleman, et al.
Published: (2025)
by: Hooper, Coleman, et al.
Published: (2025)
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
by: Jang, Hongsun, et al.
Published: (2024)
by: Jang, Hongsun, et al.
Published: (2024)
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024)
by: Li, Jinhao, et al.
Published: (2024)
Leveraging Stochastic Depth Training for Adaptive Inference
by: Korol, Guilherme, et al.
Published: (2025)
by: Korol, Guilherme, et al.
Published: (2025)
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
by: Xie, Xilong, et al.
Published: (2025)
by: Xie, Xilong, et al.
Published: (2025)
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
by: Ma, Shaobo, et al.
Published: (2024)
by: Ma, Shaobo, et al.
Published: (2024)
SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision
by: Choi, Yuseon, et al.
Published: (2025)
by: Choi, Yuseon, et al.
Published: (2025)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
by: Huang, Wei, et al.
Published: (2023)
by: Huang, Wei, et al.
Published: (2023)
LLM-USO: Large Language Model-based Universal Sizing Optimizer
by: S, Karthik Somayaji N., et al.
Published: (2025)
by: S, Karthik Somayaji N., et al.
Published: (2025)
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors
by: Zhang, Chengming, et al.
Published: (2024)
by: Zhang, Chengming, et al.
Published: (2024)
AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies
by: Sharma, Amit
Published: (2025)
by: Sharma, Amit
Published: (2025)
Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon Photonics
by: Afifi, Salma, et al.
Published: (2024)
by: Afifi, Salma, et al.
Published: (2024)
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
by: Lee, Jungi, et al.
Published: (2024)
by: Lee, Jungi, et al.
Published: (2024)
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
by: Lee, Jungi, et al.
Published: (2025)
by: Lee, Jungi, et al.
Published: (2025)
Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts
by: Banasik, Spencer
Published: (2025)
by: Banasik, Spencer
Published: (2025)
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
by: Kim, Wonung, et al.
Published: (2025)
by: Kim, Wonung, et al.
Published: (2025)
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation
by: Zhang, Zixi, et al.
Published: (2023)
by: Zhang, Zixi, et al.
Published: (2023)
AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks
by: Gimenes, Pedro, et al.
Published: (2025)
by: Gimenes, Pedro, et al.
Published: (2025)
PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models
by: Cho, Eunyeong, et al.
Published: (2026)
by: Cho, Eunyeong, et al.
Published: (2026)
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
by: Yun, Sungmin, et al.
Published: (2024)
by: Yun, Sungmin, et al.
Published: (2024)
Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models
by: Vahdatpour, Mohammad Saleh, et al.
Published: (2026)
by: Vahdatpour, Mohammad Saleh, et al.
Published: (2026)
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
by: Fu, Yonggan, et al.
Published: (2023)
by: Fu, Yonggan, et al.
Published: (2023)
LLM-VeriPPA: Power, Performance, and Area Optimization aware Verilog Code Generation with Large Language Models
by: Thorat, Kiran, et al.
Published: (2025)
by: Thorat, Kiran, et al.
Published: (2025)
AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology
by: Cong, Rongqing, et al.
Published: (2024)
by: Cong, Rongqing, et al.
Published: (2024)
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models
by: Xu, Lei, et al.
Published: (2025)
by: Xu, Lei, et al.
Published: (2025)
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification
by: Pinckney, Nathaniel, et al.
Published: (2025)
by: Pinckney, Nathaniel, et al.
Published: (2025)
CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures
by: Qi, Yingjie, et al.
Published: (2025)
by: Qi, Yingjie, et al.
Published: (2025)
ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model
by: Xu, Ning, et al.
Published: (2024)
by: Xu, Ning, et al.
Published: (2024)
Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware
by: Mueller, Lion, et al.
Published: (2025)
by: Mueller, Lion, et al.
Published: (2025)
HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
by: Zheng, Pingqing, et al.
Published: (2025)
by: Zheng, Pingqing, et al.
Published: (2025)
SONIQ: System-Optimized Noise-Injected Ultra-Low-Precision Quantization with Full-Precision Parity
by: Zhou, Cyrus, et al.
Published: (2023)
by: Zhou, Cyrus, et al.
Published: (2023)
Efficient VQ-QAT and Mixed Vector/Linear quantized Neural Networks
by: Gou, Terry, et al.
Published: (2026)
by: Gou, Terry, et al.
Published: (2026)
Similar Items
-
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026) -
MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI
by: Jiang, Zijun, et al.
Published: (2025) -
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization
by: Kim, Daeun, et al.
Published: (2025) -
MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration
by: Armeniakos, Giorgos, et al.
Published: (2025) -
A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
by: Killian, Earl
Published: (2026)