Saved in:
| Main Authors: | Wang, Peipei, Guan, Wu, Liang, Liping, Wang, Zhijun, Luo, Hanqing, Zhang, Zhibin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.14139 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models
by: Huang, Mingqiang, et al.
Published: (2024)
by: Huang, Mingqiang, et al.
Published: (2024)
Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
by: Li, Jindong, et al.
Published: (2025)
by: Li, Jindong, et al.
Published: (2025)
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025)
by: Zhong, Linfeng, et al.
Published: (2025)
ISAAC: Intelligent, Scalable, Agile, and Accelerated CPU Verification via LLM-aided FPGA Parallelism
by: Sun, Jialin, et al.
Published: (2025)
by: Sun, Jialin, et al.
Published: (2025)
FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
by: Wang, Aotao, et al.
Published: (2025)
by: Wang, Aotao, et al.
Published: (2025)
STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-Design
by: Wang, Kainan, et al.
Published: (2025)
by: Wang, Kainan, et al.
Published: (2025)
Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
by: Li, Cong, et al.
Published: (2026)
by: Li, Cong, et al.
Published: (2026)
HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference
by: Duan, Cenlin, et al.
Published: (2025)
by: Duan, Cenlin, et al.
Published: (2025)
Graphitron: A Domain Specific Language for FPGA-based Graph Processing Accelerator Generation
by: Zhang, Xinmiao, et al.
Published: (2024)
by: Zhang, Xinmiao, et al.
Published: (2024)
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
by: Chen, Hongzheng, et al.
Published: (2023)
by: Chen, Hongzheng, et al.
Published: (2023)
TerEffic: Highly Efficient Ternary LLM Inference on FPGA
by: Yin, Chenyang, et al.
Published: (2025)
by: Yin, Chenyang, et al.
Published: (2025)
DRACO: Co-design for DSP-Efficient Rigid Body Dynamics Accelerator
by: Liu, Xingyu, et al.
Published: (2025)
by: Liu, Xingyu, et al.
Published: (2025)
FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference
by: Hsieh, Fen-Yu, et al.
Published: (2025)
by: Hsieh, Fen-Yu, et al.
Published: (2025)
TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification
by: Zhong, Yang, et al.
Published: (2025)
by: Zhong, Yang, et al.
Published: (2025)
Late Breaking Result: FPGA-Based Emulation and Fault Injection for CNN Inference Accelerators
by: Masar, Filip, et al.
Published: (2025)
by: Masar, Filip, et al.
Published: (2025)
FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
by: Jayanth, Rakshith, et al.
Published: (2026)
by: Jayanth, Rakshith, et al.
Published: (2026)
CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA
by: Dong, Jiale, et al.
Published: (2025)
by: Dong, Jiale, et al.
Published: (2025)
PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator
by: Chong, Yue Jiet, et al.
Published: (2026)
by: Chong, Yue Jiet, et al.
Published: (2026)
XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
by: Yu, Feng, et al.
Published: (2026)
by: Yu, Feng, et al.
Published: (2026)
Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025)
by: Pouget, Stéphane, et al.
Published: (2025)
An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
by: Zhang, Weichuang, et al.
Published: (2024)
by: Zhang, Weichuang, et al.
Published: (2024)
SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
by: He, Zicheng, et al.
Published: (2026)
by: He, Zicheng, et al.
Published: (2026)
ZynqParrot: A Scale-Down Approach to Cycle-Accurate, FPGA-Accelerated Co-Emulation
by: Ruelas-Petrisko, Daniel, et al.
Published: (2025)
by: Ruelas-Petrisko, Daniel, et al.
Published: (2025)
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
by: Liang, Yanbiao, et al.
Published: (2025)
by: Liang, Yanbiao, et al.
Published: (2025)
PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration
by: Chong, Yue Jiet, et al.
Published: (2025)
by: Chong, Yue Jiet, et al.
Published: (2025)
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators
by: Zhang, Chi, et al.
Published: (2026)
by: Zhang, Chi, et al.
Published: (2026)
An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes
by: Wang, Miaoxin, et al.
Published: (2024)
by: Wang, Miaoxin, et al.
Published: (2024)
RHS-TRNG: A Resilient High-Speed True Random Number Generator Based on STT-MTJ Device
by: Fu, Siqing, et al.
Published: (2023)
by: Fu, Siqing, et al.
Published: (2023)
Bombyx: OpenCilk Compilation for FPGA Hardware Acceleration
by: Shahawy, Mohamed, et al.
Published: (2025)
by: Shahawy, Mohamed, et al.
Published: (2025)
Implementation and Analysis of Thermometer Encoding in DWN FPGA Accelerators
by: Mecik, Michael, et al.
Published: (2025)
by: Mecik, Michael, et al.
Published: (2025)
SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design
by: Wu, Junyi, et al.
Published: (2025)
by: Wu, Junyi, et al.
Published: (2025)
DCI: A Coordinated Allocation and Filling Workload-Aware Dual-Cache Allocation GNN Inference Acceleration System
by: Luo, Yi, et al.
Published: (2025)
by: Luo, Yi, et al.
Published: (2025)
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024)
by: Li, Jinhao, et al.
Published: (2024)
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration
by: Yunusoglu, Aybars, et al.
Published: (2026)
by: Yunusoglu, Aybars, et al.
Published: (2026)
Analyzing the capabilities of HLS and RTL tools in the design of an FPGA Montgomery Multiplier
by: Ifrim, Rares, et al.
Published: (2025)
by: Ifrim, Rares, et al.
Published: (2025)
Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
by: Taka, Endri, et al.
Published: (2025)
by: Taka, Endri, et al.
Published: (2025)
Embedded FPGA Acceleration of Brain-Like Neural Networks: Online Learning to Scalable Inference
by: Hafiz, Muhammad Ihsan Al, et al.
Published: (2025)
by: Hafiz, Muhammad Ihsan Al, et al.
Published: (2025)
Exploring FPGA designs for MX and beyond
by: Samson, Ebby, et al.
Published: (2024)
by: Samson, Ebby, et al.
Published: (2024)
Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
by: Liu, Yiqi, et al.
Published: (2026)
by: Liu, Yiqi, et al.
Published: (2026)
Similar Items
-
EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models
by: Huang, Mingqiang, et al.
Published: (2024) -
Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
by: Li, Jindong, et al.
Published: (2025) -
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025) -
ISAAC: Intelligent, Scalable, Agile, and Accelerated CPU Verification via LLM-aided FPGA Parallelism
by: Sun, Jialin, et al.
Published: (2025) -
FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
by: Wang, Aotao, et al.
Published: (2025)