Saved in:
| Main Authors: | Zhang, Chengming, Ding, Xinheng, Sun, Baixi, Yu, Xiaodong, Zheng, Weijian, Xie, Zhen, Tao, Dingwen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.19829 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024)
by: Li, Jinhao, et al.
Published: (2024)
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
by: Kim, Wonung, et al.
Published: (2025)
by: Kim, Wonung, et al.
Published: (2025)
Large Processor Chip Model
by: Chang, Kaiyan, et al.
Published: (2025)
by: Chang, Kaiyan, et al.
Published: (2025)
Faster Inference of LLMs using FP8 on the Intel Gaudi
by: Lee, Joonhyung, et al.
Published: (2025)
by: Lee, Joonhyung, et al.
Published: (2025)
LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference
by: Moon, Seungjae, et al.
Published: (2024)
by: Moon, Seungjae, et al.
Published: (2024)
LLM-VeriPPA: Power, Performance, and Area Optimization aware Verilog Code Generation with Large Language Models
by: Thorat, Kiran, et al.
Published: (2025)
by: Thorat, Kiran, et al.
Published: (2025)
MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor
by: Sui, Bingcai, et al.
Published: (2024)
by: Sui, Bingcai, et al.
Published: (2024)
AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies
by: Sharma, Amit
Published: (2025)
by: Sharma, Amit
Published: (2025)
Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon Photonics
by: Afifi, Salma, et al.
Published: (2024)
by: Afifi, Salma, et al.
Published: (2024)
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
by: Lee, Jungi, et al.
Published: (2024)
by: Lee, Jungi, et al.
Published: (2024)
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
by: Fu, Yonggan, et al.
Published: (2023)
by: Fu, Yonggan, et al.
Published: (2023)
Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow
by: Hsiung, Ching-Lin, et al.
Published: (2025)
by: Hsiung, Ching-Lin, et al.
Published: (2025)
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
by: Wolters, Christopher, et al.
Published: (2024)
by: Wolters, Christopher, et al.
Published: (2024)
LLM-USO: Large Language Model-based Universal Sizing Optimizer
by: S, Karthik Somayaji N., et al.
Published: (2025)
by: S, Karthik Somayaji N., et al.
Published: (2025)
SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
by: He, Zicheng, et al.
Published: (2026)
by: He, Zicheng, et al.
Published: (2026)
FPGA-Optimized Hardware Accelerator for Fast Fourier Transform and Singular Value Decomposition in AI
by: Ding, Hong, et al.
Published: (2025)
by: Ding, Hong, et al.
Published: (2025)
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
by: Wang, Chenyu, et al.
Published: (2023)
by: Wang, Chenyu, et al.
Published: (2023)
TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification
by: Zhong, Yang, et al.
Published: (2025)
by: Zhong, Yang, et al.
Published: (2025)
QiMeng: Fully Automated Hardware and Software Design for Processor Chip
by: Zhang, Rui, et al.
Published: (2025)
by: Zhang, Rui, et al.
Published: (2025)
Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing
by: Huo, Juncheng, et al.
Published: (2025)
by: Huo, Juncheng, et al.
Published: (2025)
End-to-End Transformer Acceleration Through Processing-in-Memory Architectures
by: Yang, Xiaoxuan, et al.
Published: (2025)
by: Yang, Xiaoxuan, et al.
Published: (2025)
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
by: İslamoğlu, Gamze, et al.
Published: (2023)
by: İslamoğlu, Gamze, et al.
Published: (2023)
Sustainable Transformer Neural Network Acceleration with Stochastic Photonic Computing
by: Afifi, S., et al.
Published: (2026)
by: Afifi, S., et al.
Published: (2026)
TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning of Small Transformer Models at the Extreme Edge
by: Wang, Run, et al.
Published: (2026)
by: Wang, Run, et al.
Published: (2026)
SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models
by: Xia, Yuhuan, et al.
Published: (2026)
by: Xia, Yuhuan, et al.
Published: (2026)
HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
by: Zheng, Pingqing, et al.
Published: (2025)
by: Zheng, Pingqing, et al.
Published: (2025)
CATransformers: Carbon Aware Transformers Through Joint Model-Hardware Optimization
by: Wang, Irene, et al.
Published: (2025)
by: Wang, Irene, et al.
Published: (2025)
An Efficient Data Reuse with Tile-Based Adaptive Stationary for Transformer Accelerators
by: Li, Tseng-Jen, et al.
Published: (2025)
by: Li, Tseng-Jen, et al.
Published: (2025)
An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
by: Shao, Haikuo, et al.
Published: (2024)
by: Shao, Haikuo, et al.
Published: (2024)
A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge
by: Huang, Longwei, et al.
Published: (2023)
by: Huang, Longwei, et al.
Published: (2023)
FiCABU: A Fisher-Based, Context-Adaptive Machine Unlearning Processor for Edge AI
by: Cho, Eun-Su, et al.
Published: (2025)
by: Cho, Eun-Su, et al.
Published: (2025)
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
by: Koo, Jahyun, et al.
Published: (2024)
by: Koo, Jahyun, et al.
Published: (2024)
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
by: Ma, Shaobo, et al.
Published: (2024)
by: Ma, Shaobo, et al.
Published: (2024)
Accelerating Sparse Graph Neural Networks with Tensor Core Optimization
by: Wu, Ka Wai
Published: (2024)
by: Wu, Ka Wai
Published: (2024)
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
by: Qiao, Ye, et al.
Published: (2025)
by: Qiao, Ye, et al.
Published: (2025)
An Analog and Digital Hybrid Attention Accelerator for Transformers with Charge-based In-memory Computing
by: Moradifirouzabadi, Ashkan, et al.
Published: (2024)
by: Moradifirouzabadi, Ashkan, et al.
Published: (2024)
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures
by: Dhingra, Pratyush, et al.
Published: (2025)
by: Dhingra, Pratyush, et al.
Published: (2025)
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks
by: Afifi, Salma, et al.
Published: (2024)
by: Afifi, Salma, et al.
Published: (2024)
Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models
by: Xu, Lei, et al.
Published: (2025)
by: Xu, Lei, et al.
Published: (2025)
Optimized Spatial Architecture Mapping Flow for Transformer Accelerators
by: Xu, Haocheng, et al.
Published: (2024)
by: Xu, Haocheng, et al.
Published: (2024)
Similar Items
-
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024) -
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
by: Kim, Wonung, et al.
Published: (2025) -
Large Processor Chip Model
by: Chang, Kaiyan, et al.
Published: (2025) -
Faster Inference of LLMs using FP8 on the Intel Gaudi
by: Lee, Joonhyung, et al.
Published: (2025) -
LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference
by: Moon, Seungjae, et al.
Published: (2024)