Saved in:
| Main Authors: | Peng, Tianfan, Qin, Jiajun, Xia, Tianhua, Zhang, Sai Qian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.11832 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hyft: A Reconfigurable Softmax Accelerator with Hybrid Numeric Format for both Training and Inference
by: Xia, Tianhua, et al.
Published: (2023)
by: Xia, Tianhua, et al.
Published: (2023)
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025)
by: Xia, Tianhua, et al.
Published: (2025)
Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025)
by: Pouget, Stéphane, et al.
Published: (2025)
Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
by: Li, Jindong, et al.
Published: (2025)
by: Li, Jindong, et al.
Published: (2025)
SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models
by: Xia, Yuhuan, et al.
Published: (2026)
by: Xia, Yuhuan, et al.
Published: (2026)
EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models
by: Huang, Mingqiang, et al.
Published: (2024)
by: Huang, Mingqiang, et al.
Published: (2024)
SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator
by: Wang, Peipei, et al.
Published: (2025)
by: Wang, Peipei, et al.
Published: (2025)
SA-DS: A Dataset for Large Language Model-Driven AI Accelerator Design Generation
by: Vungarala, Deepak, et al.
Published: (2024)
by: Vungarala, Deepak, et al.
Published: (2024)
A Review on Proprietary Accelerators for Large Language Models
by: Park, Sihyeong, et al.
Published: (2025)
by: Park, Sihyeong, et al.
Published: (2025)
BBAL: A Bidirectional Block Floating Point-Based Quantisation Accelerator for Large Language Models
by: Han, Xiaomeng, et al.
Published: (2025)
by: Han, Xiaomeng, et al.
Published: (2025)
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025)
by: Yubeaton, Patrick, et al.
Published: (2025)
HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference
by: Duan, Cenlin, et al.
Published: (2025)
by: Duan, Cenlin, et al.
Published: (2025)
Memory-efficient Sketch Acceleration for Handling Large Network Flows on FPGAs
by: Han, Zhaoyang, et al.
Published: (2025)
by: Han, Zhaoyang, et al.
Published: (2025)
Managing Hybrid Solid-State Drives Using Large Language Models
by: Wei, Qian, et al.
Published: (2025)
by: Wei, Qian, et al.
Published: (2025)
A3D-MoE: Acceleration of Large Language Models with Mixture of Experts via 3D Heterogeneous Integration
by: Huang, Wei-Hsing, et al.
Published: (2025)
by: Huang, Wei-Hsing, et al.
Published: (2025)
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors
by: Zhang, Chengming, et al.
Published: (2024)
by: Zhang, Chengming, et al.
Published: (2024)
AnalogMaster: Large Language Model-based Automated Analog IC Design Framework from Image to Layout
by: Qin, Xian Rong, et al.
Published: (2026)
by: Qin, Xian Rong, et al.
Published: (2026)
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024)
by: Li, Jinhao, et al.
Published: (2024)
TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture
by: Wu, Jiajun, et al.
Published: (2024)
by: Wu, Jiajun, et al.
Published: (2024)
SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations
by: Parthasarathy, Rishab
Published: (2025)
by: Parthasarathy, Rishab
Published: (2025)
Leveraging Application-Specific Knowledge for Energy-Efficient Deep Learning Accelerators on Resource-Constrained FPGAs
by: Qian, Chao
Published: (2025)
by: Qian, Chao
Published: (2025)
Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models
by: Peng, Huwan, et al.
Published: (2023)
by: Peng, Huwan, et al.
Published: (2023)
31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding
by: Dong, Pingcheng, et al.
Published: (2026)
by: Dong, Pingcheng, et al.
Published: (2026)
AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs
by: An, Hangcheng, et al.
Published: (2026)
by: An, Hangcheng, et al.
Published: (2026)
Mixed Structural Choice Operator: Enhancing Technology Mapping with Heterogeneous Representations
by: Hu, Zhang, et al.
Published: (2025)
by: Hu, Zhang, et al.
Published: (2025)
RAS: A Bit-Exact rANS Accelerator For High-Performance Neural Lossless Compression
by: Qin, Yuchao, et al.
Published: (2025)
by: Qin, Yuchao, et al.
Published: (2025)
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators
by: Zhang, Chi, et al.
Published: (2026)
by: Zhang, Chi, et al.
Published: (2026)
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
by: Kim, Wonung, et al.
Published: (2025)
by: Kim, Wonung, et al.
Published: (2025)
Chiplets on Wheels: Review Paper on Holistic Chiplet Solutions for Autonomous Vehicles
by: Narashiman, Swathi, et al.
Published: (2024)
by: Narashiman, Swathi, et al.
Published: (2024)
Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design
by: Qian, Chao, et al.
Published: (2026)
by: Qian, Chao, et al.
Published: (2026)
Graphitron: A Domain Specific Language for FPGA-based Graph Processing Accelerator Generation
by: Zhang, Xinmiao, et al.
Published: (2024)
by: Zhang, Xinmiao, et al.
Published: (2024)
AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies
by: Sharma, Amit
Published: (2025)
by: Sharma, Amit
Published: (2025)
Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon Photonics
by: Afifi, Salma, et al.
Published: (2024)
by: Afifi, Salma, et al.
Published: (2024)
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
by: Lee, Jungi, et al.
Published: (2024)
by: Lee, Jungi, et al.
Published: (2024)
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs
by: Taka, Endri, et al.
Published: (2024)
by: Taka, Endri, et al.
Published: (2024)
GSIM: Accelerating RTL Simulation for Large-Scale Designs
by: Chen, Lu, et al.
Published: (2025)
by: Chen, Lu, et al.
Published: (2025)
FireFly-P: FPGA-Accelerated Spiking Neural Network Plasticity for Robust Adaptive Control
by: Li, Tenglong, et al.
Published: (2026)
by: Li, Tenglong, et al.
Published: (2026)
FireFly-T: High-Throughput Sparsity Exploitation for Spiking Transformer Acceleration with Dual-Engine Overlay Architecture
by: Li, Tenglong, et al.
Published: (2025)
by: Li, Tenglong, et al.
Published: (2025)
FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture
by: Li, Tenglong, et al.
Published: (2024)
by: Li, Tenglong, et al.
Published: (2024)
Subitizing-Inspired_Large_Language_Models_for_Floorplanning
by: Lu, Shao-Chien, et al.
Published: (2025)
by: Lu, Shao-Chien, et al.
Published: (2025)
Similar Items
-
Hyft: A Reconfigurable Softmax Accelerator with Hybrid Numeric Format for both Training and Inference
by: Xia, Tianhua, et al.
Published: (2023) -
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025) -
Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025) -
Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
by: Li, Jindong, et al.
Published: (2025) -
SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models
by: Xia, Yuhuan, et al.
Published: (2026)