Saved in:
| Main Authors: | Sui, Bingcai, Shen, Junzhong, Sun, Caixia, Wang, Junhui, Zheng, Zhong, Guo, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.19180 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FASE: FPGA-Assisted Syscall Emulation for Rapid End-to-End Processor Performance Validation
by: Meng, Chengzhen, et al.
Published: (2025)
by: Meng, Chengzhen, et al.
Published: (2025)
Transitive Array: An Efficient GEMM Accelerator with Result Reuse
by: Guo, Cong, et al.
Published: (2025)
by: Guo, Cong, et al.
Published: (2025)
GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending
by: Li, Haomin, et al.
Published: (2026)
by: Li, Haomin, et al.
Published: (2026)
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs
by: Taka, Endri, et al.
Published: (2024)
by: Taka, Endri, et al.
Published: (2024)
TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification
by: Zhong, Yang, et al.
Published: (2025)
by: Zhong, Yang, et al.
Published: (2025)
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)
by: Mhatre, Kaustubh, et al.
Published: (2025)
FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables
by: Park, Gunho, et al.
Published: (2025)
by: Park, Gunho, et al.
Published: (2025)
Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
by: Papalamprou, Ilias, et al.
Published: (2025)
by: Papalamprou, Ilias, et al.
Published: (2025)
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors
by: Zhang, Chengming, et al.
Published: (2024)
by: Zhang, Chengming, et al.
Published: (2024)
Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators
by: Shen, Aofeng, et al.
Published: (2025)
by: Shen, Aofeng, et al.
Published: (2025)
Loop Control Management in Tightly Coupled Processor Arrays (TCPAs)
by: Walter, Dominik, et al.
Published: (2026)
by: Walter, Dominik, et al.
Published: (2026)
tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit
by: Vellaisamy, Prabhu, et al.
Published: (2024)
by: Vellaisamy, Prabhu, et al.
Published: (2024)
SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUs
by: Ta, Tuan, et al.
Published: (2025)
by: Ta, Tuan, et al.
Published: (2025)
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
by: Nair, Harideep, et al.
Published: (2024)
by: Nair, Harideep, et al.
Published: (2024)
Large Processor Chip Model
by: Chang, Kaiyan, et al.
Published: (2025)
by: Chang, Kaiyan, et al.
Published: (2025)
Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs
by: Taka, Endri, et al.
Published: (2025)
by: Taka, Endri, et al.
Published: (2025)
A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations
by: Petropoulos, Anastasios, et al.
Published: (2025)
by: Petropoulos, Anastasios, et al.
Published: (2025)
Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing
by: Huo, Juncheng, et al.
Published: (2025)
by: Huo, Juncheng, et al.
Published: (2025)
A Low-Dissipation and Scalable GEMM Accelerator with Silicon Nitride Photonics
by: Karempudi, Venkata Sai Praneeth, et al.
Published: (2024)
by: Karempudi, Venkata Sai Praneeth, et al.
Published: (2024)
Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV 1.0 Compliant Open-Source Processor
by: Perotti, Matteo, et al.
Published: (2023)
by: Perotti, Matteo, et al.
Published: (2023)
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper
by: Su, Zhongling, et al.
Published: (2025)
by: Su, Zhongling, et al.
Published: (2025)
TROOP: At-the-Roofline Performance for Vector Processors on Low Operational Intensity Workloads
by: Purayil, Navaneeth Kunhi, et al.
Published: (2025)
by: Purayil, Navaneeth Kunhi, et al.
Published: (2025)
Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels
by: Alo, Oluwaseun Adewunmi, et al.
Published: (2024)
by: Alo, Oluwaseun Adewunmi, et al.
Published: (2024)
A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
by: Wang, Chuanning, et al.
Published: (2024)
by: Wang, Chuanning, et al.
Published: (2024)
Microarchitectural Co-Optimization for Sustained Throughput of RISC-V Multi-Lane Chaining Vector Processors
by: Wang, Weiying, et al.
Published: (2026)
by: Wang, Weiying, et al.
Published: (2026)
From Circuits to SoC Processors: Arithmetic Approximation Techniques & Embedded Computing Methodologies for DSP Acceleration
by: Leon, Vasileios
Published: (2023)
by: Leon, Vasileios
Published: (2023)
O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead
by: Cammarata, Danilo, et al.
Published: (2026)
by: Cammarata, Danilo, et al.
Published: (2026)
A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors
by: Kuper, Reese, et al.
Published: (2023)
by: Kuper, Reese, et al.
Published: (2023)
SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
by: Wang, Chuanning, et al.
Published: (2024)
by: Wang, Chuanning, et al.
Published: (2024)
Hypervisor Extension for a RISC-V Processor
by: Gauchola, Jaume, et al.
Published: (2024)
by: Gauchola, Jaume, et al.
Published: (2024)
FERIVer: An FPGA-assisted Emulated Framework for RTL Verification of RISC-V Processors
by: Qin, Kun, et al.
Published: (2025)
by: Qin, Kun, et al.
Published: (2025)
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
by: Ma, Haiyue, et al.
Published: (2024)
by: Ma, Haiyue, et al.
Published: (2024)
SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
by: He, Zicheng, et al.
Published: (2026)
by: He, Zicheng, et al.
Published: (2026)
MERE: Hardware-Software Co-Design for Masking Cache Miss Latency in Embedded Processors
by: You, Dean, et al.
Published: (2025)
by: You, Dean, et al.
Published: (2025)
SAMIPS: A Synthesised Asynchronous Processor
by: Zhang, Qianyi, et al.
Published: (2024)
by: Zhang, Qianyi, et al.
Published: (2024)
Banked Memories for Soft SIMT Processors
by: Langhammer, Martin, et al.
Published: (2025)
by: Langhammer, Martin, et al.
Published: (2025)
MultiVic: A Time-Predictable RISC-V Multi-Core Processor Optimized for Neural Network Inference
by: Kirschner, Maximilian, et al.
Published: (2025)
by: Kirschner, Maximilian, et al.
Published: (2025)
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025)
by: Zhong, Linfeng, et al.
Published: (2025)
A 950 MHz SIMT Soft Processor
by: Langhammer, Martin, et al.
Published: (2025)
by: Langhammer, Martin, et al.
Published: (2025)
AGON: Automated Design Framework for Customizing Processors from ISA Documents
by: Li, Chongxiao, et al.
Published: (2024)
by: Li, Chongxiao, et al.
Published: (2024)
Similar Items
-
FASE: FPGA-Assisted Syscall Emulation for Rapid End-to-End Processor Performance Validation
by: Meng, Chengzhen, et al.
Published: (2025) -
Transitive Array: An Efficient GEMM Accelerator with Result Reuse
by: Guo, Cong, et al.
Published: (2025) -
GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending
by: Li, Haomin, et al.
Published: (2026) -
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs
by: Taka, Endri, et al.
Published: (2024) -
TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification
by: Zhong, Yang, et al.
Published: (2025)