:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sui, Bingcai, Shen, Junzhong, Sun, Caixia, Wang, Junhui, Zheng, Zhong, Guo, Wei
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2404.19180
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FASE: FPGA-Assisted Syscall Emulation for Rapid End-to-End Processor Performance Validation
by: Meng, Chengzhen, et al.
Published: (2025)

Transitive Array: An Efficient GEMM Accelerator with Result Reuse
by: Guo, Cong, et al.
Published: (2025)

GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending
by: Li, Haomin, et al.
Published: (2026)

Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs
by: Taka, Endri, et al.
Published: (2024)

TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification
by: Zhong, Yang, et al.
Published: (2025)

GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)

FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables
by: Park, Gunho, et al.
Published: (2025)

Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
by: Papalamprou, Ilias, et al.
Published: (2025)

GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors
by: Zhang, Chengming, et al.
Published: (2024)

Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators
by: Shen, Aofeng, et al.
Published: (2025)

Loop Control Management in Tightly Coupled Processor Arrays (TCPAs)
by: Walter, Dominik, et al.
Published: (2026)

tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit
by: Vellaisamy, Prabhu, et al.
Published: (2024)

SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUs
by: Ta, Tuan, et al.
Published: (2025)

tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI
by: Nair, Harideep, et al.
Published: (2024)

Large Processor Chip Model
by: Chang, Kaiyan, et al.
Published: (2025)

Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs
by: Taka, Endri, et al.
Published: (2025)

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations
by: Petropoulos, Anastasios, et al.
Published: (2025)

Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing
by: Huo, Juncheng, et al.
Published: (2025)

A Low-Dissipation and Scalable GEMM Accelerator with Silicon Nitride Photonics
by: Karempudi, Venkata Sai Praneeth, et al.
Published: (2024)

Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV 1.0 Compliant Open-Source Processor
by: Perotti, Matteo, et al.
Published: (2023)

TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper
by: Su, Zhongling, et al.
Published: (2025)

TROOP: At-the-Roofline Performance for Vector Processors on Low Operational Intensity Workloads
by: Purayil, Navaneeth Kunhi, et al.
Published: (2025)

Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels
by: Alo, Oluwaseun Adewunmi, et al.
Published: (2024)

A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
by: Wang, Chuanning, et al.
Published: (2024)

Microarchitectural Co-Optimization for Sustained Throughput of RISC-V Multi-Lane Chaining Vector Processors
by: Wang, Weiying, et al.
Published: (2026)

From Circuits to SoC Processors: Arithmetic Approximation Techniques & Embedded Computing Methodologies for DSP Acceleration
by: Leon, Vasileios
Published: (2023)

O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration with minimal buffering overhead
by: Cammarata, Danilo, et al.
Published: (2026)

A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors
by: Kuper, Reese, et al.
Published: (2023)

SPEED: A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
by: Wang, Chuanning, et al.
Published: (2024)

Hypervisor Extension for a RISC-V Processor
by: Gauchola, Jaume, et al.
Published: (2024)

FERIVer: An FPGA-assisted Emulated Framework for RTL Verification of RISC-V Processors
by: Qin, Kun, et al.
Published: (2025)

Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
by: Ma, Haiyue, et al.
Published: (2024)

SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
by: He, Zicheng, et al.
Published: (2026)

MERE: Hardware-Software Co-Design for Masking Cache Miss Latency in Embedded Processors
by: You, Dean, et al.
Published: (2025)

SAMIPS: A Synthesised Asynchronous Processor
by: Zhang, Qianyi, et al.
Published: (2024)

Banked Memories for Soft SIMT Processors
by: Langhammer, Martin, et al.
Published: (2025)

MultiVic: A Time-Predictable RISC-V Multi-Core Processor Optimized for Neural Network Inference
by: Kirschner, Maximilian, et al.
Published: (2025)

SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025)

A 950 MHz SIMT Soft Processor
by: Langhammer, Martin, et al.
Published: (2025)

AGON: Automated Design Framework for Customizing Processors from ISA Documents
by: Li, Chongxiao, et al.
Published: (2024)