:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Peipei, Guan, Wu, Liang, Liping, Wang, Zhijun, Luo, Hanqing, Zhang, Zhibin
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2507.14139
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models
by: Huang, Mingqiang, et al.
Published: (2024)

Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
by: Li, Jindong, et al.
Published: (2025)

SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025)

ISAAC: Intelligent, Scalable, Agile, and Accelerated CPU Verification via LLM-aided FPGA Parallelism
by: Sun, Jialin, et al.
Published: (2025)

FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
by: Wang, Aotao, et al.
Published: (2025)

STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-Design
by: Wang, Kainan, et al.
Published: (2025)

Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
by: Li, Cong, et al.
Published: (2026)

HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference
by: Duan, Cenlin, et al.
Published: (2025)

Graphitron: A Domain Specific Language for FPGA-based Graph Processing Accelerator Generation
by: Zhang, Xinmiao, et al.
Published: (2024)

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
by: Chen, Hongzheng, et al.
Published: (2023)

TerEffic: Highly Efficient Ternary LLM Inference on FPGA
by: Yin, Chenyang, et al.
Published: (2025)

DRACO: Co-design for DSP-Efficient Rigid Body Dynamics Accelerator
by: Liu, Xingyu, et al.
Published: (2025)

FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference
by: Hsieh, Fen-Yu, et al.
Published: (2025)

TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification
by: Zhong, Yang, et al.
Published: (2025)

Late Breaking Result: FPGA-Based Emulation and Fault Injection for CNN Inference Accelerators
by: Masar, Filip, et al.
Published: (2025)

FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
by: Jayanth, Rakshith, et al.
Published: (2026)

CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA
by: Dong, Jiale, et al.
Published: (2025)

PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator
by: Chong, Yue Jiet, et al.
Published: (2026)

XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
by: Yu, Feng, et al.
Published: (2026)

Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025)

An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
by: Zhang, Weichuang, et al.
Published: (2024)

SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
by: He, Zicheng, et al.
Published: (2026)

ZynqParrot: A Scale-Down Approach to Cycle-Accurate, FPGA-Accelerated Co-Emulation
by: Ruelas-Petrisko, Daniel, et al.
Published: (2025)

AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
by: Liang, Yanbiao, et al.
Published: (2025)

PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration
by: Chong, Yue Jiet, et al.
Published: (2025)

FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators
by: Zhang, Chi, et al.
Published: (2026)

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes
by: Wang, Miaoxin, et al.
Published: (2024)

RHS-TRNG: A Resilient High-Speed True Random Number Generator Based on STT-MTJ Device
by: Fu, Siqing, et al.
Published: (2023)

Bombyx: OpenCilk Compilation for FPGA Hardware Acceleration
by: Shahawy, Mohamed, et al.
Published: (2025)

Implementation and Analysis of Thermometer Encoding in DWN FPGA Accelerators
by: Mecik, Michael, et al.
Published: (2025)

SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design
by: Wu, Junyi, et al.
Published: (2025)

DCI: A Coordinated Allocation and Filling Workload-Aware Dual-Cache Allocation GNN Inference Acceleration System
by: Luo, Yi, et al.
Published: (2025)

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024)

MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)

A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration
by: Yunusoglu, Aybars, et al.
Published: (2026)

Analyzing the capabilities of HLS and RTL tools in the design of an FPGA Montgomery Multiplier
by: Ifrim, Rares, et al.
Published: (2025)

Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
by: Taka, Endri, et al.
Published: (2025)

Embedded FPGA Acceleration of Brain-Like Neural Networks: Online Learning to Scalable Inference
by: Hafiz, Muhammad Ihsan Al, et al.
Published: (2025)

Exploring FPGA designs for MX and beyond
by: Samson, Ebby, et al.
Published: (2024)

Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
by: Liu, Yiqi, et al.
Published: (2026)