Saved in:
| Main Authors: | Zhang, Wenbo, Liu, Yiqi, Bao, Zhenshan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.09689 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
by: Papalamprou, Ilias, et al.
Published: (2025)
by: Papalamprou, Ilias, et al.
Published: (2025)
WideSA: A High Array Utilization Mapping Scheme for Uniform Recurrences on the Versal ACAP Architecture
by: Dai, Tuo, et al.
Published: (2024)
by: Dai, Tuo, et al.
Published: (2024)
DPUV4E: High-Throughput DPU Architecture Design for CNN on Versal ACAP
by: Li, Guoyu, et al.
Published: (2025)
by: Li, Guoyu, et al.
Published: (2025)
AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP
by: Li, Enlai, et al.
Published: (2026)
by: Li, Enlai, et al.
Published: (2026)
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)
by: Mhatre, Kaustubh, et al.
Published: (2026)
Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication
by: Ohno, Ayumi, et al.
Published: (2025)
by: Ohno, Ayumi, et al.
Published: (2025)
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)
by: Mhatre, Kaustubh, et al.
Published: (2025)
AMD Versal Implementations of FAM and SSCA Estimators
by: Li, Carol Jingyi, et al.
Published: (2025)
by: Li, Carol Jingyi, et al.
Published: (2025)
Exploring the Versal AI Engine for 3D Gaussian Splatting
by: Shimamura, Kotaro, et al.
Published: (2025)
by: Shimamura, Kotaro, et al.
Published: (2025)
Enabling Mixed criticality applications for the Versal AI-Engines
by: Sprave, Vincent, et al.
Published: (2026)
by: Sprave, Vincent, et al.
Published: (2026)
Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing
by: Huo, Juncheng, et al.
Published: (2025)
by: Huo, Juncheng, et al.
Published: (2025)
ApproxPilot: A GNN-based Accelerator Approximation Framework
by: Zhang, Qing, et al.
Published: (2024)
by: Zhang, Qing, et al.
Published: (2024)
AGON: Automated Design Framework for Customizing Processors from ISA Documents
by: Li, Chongxiao, et al.
Published: (2024)
by: Li, Chongxiao, et al.
Published: (2024)
An Efficient Sparse Hardware Accelerator for Spike-Driven Transformer
by: Li, Zhengke, et al.
Published: (2025)
by: Li, Zhengke, et al.
Published: (2025)
FPGA-Optimized Hardware Accelerator for Fast Fourier Transform and Singular Value Decomposition in AI
by: Ding, Hong, et al.
Published: (2025)
by: Ding, Hong, et al.
Published: (2025)
Efficient Implementation of an Adaptive Transformer Accelerator for Massive MIMO Outdoor Localization
by: Yaman, Ilayda, et al.
Published: (2026)
by: Yaman, Ilayda, et al.
Published: (2026)
TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture
by: Wu, Jiajun, et al.
Published: (2024)
by: Wu, Jiajun, et al.
Published: (2024)
Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025)
by: Pouget, Stéphane, et al.
Published: (2025)
Optimized Spatial Architecture Mapping Flow for Transformer Accelerators
by: Xu, Haocheng, et al.
Published: (2024)
by: Xu, Haocheng, et al.
Published: (2024)
GSIM: Accelerating RTL Simulation for Large-Scale Designs
by: Chen, Lu, et al.
Published: (2025)
by: Chen, Lu, et al.
Published: (2025)
Mozart: A Chiplet Ecosystem-Accelerator Codesign Framework for Composable Bespoke Application Specific Integrated Circuits
by: Jin, Haoran, et al.
Published: (2025)
by: Jin, Haoran, et al.
Published: (2025)
CAMASim: A Comprehensive Simulation Framework for Content-Addressable Memory based Accelerators
by: Li, Mengyuan, et al.
Published: (2024)
by: Li, Mengyuan, et al.
Published: (2024)
Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers
by: Song, Zihang, et al.
Published: (2024)
by: Song, Zihang, et al.
Published: (2024)
Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators
by: Huang, Zongle, et al.
Published: (2026)
by: Huang, Zongle, et al.
Published: (2026)
FPGA-based Emulation and Device-Side Management for CXL-based Memory Tiering Systems
by: Chen, Yiqi, et al.
Published: (2025)
by: Chen, Yiqi, et al.
Published: (2025)
PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers
by: Wu, Yuting, et al.
Published: (2023)
by: Wu, Yuting, et al.
Published: (2023)
MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI
by: M, Ajay Kumar, et al.
Published: (2025)
by: M, Ajay Kumar, et al.
Published: (2025)
FireFly-T: High-Throughput Sparsity Exploitation for Spiking Transformer Acceleration with Dual-Engine Overlay Architecture
by: Li, Tenglong, et al.
Published: (2025)
by: Li, Tenglong, et al.
Published: (2025)
Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
by: Liu, Yiqi, et al.
Published: (2026)
by: Liu, Yiqi, et al.
Published: (2026)
Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers
by: Titopoulos, Vasileios, et al.
Published: (2025)
by: Titopoulos, Vasileios, et al.
Published: (2025)
Enabling Efficient Hardware Acceleration of Hybrid Vision Transformer (ViT) Networks at the Edge
by: Dumoulin, Joren, et al.
Published: (2025)
by: Dumoulin, Joren, et al.
Published: (2025)
Hardware Efficient Accelerator for Spiking Transformer With Reconfigurable Parallel Time Step Computing
by: Chen, Bo-Yu, et al.
Published: (2025)
by: Chen, Bo-Yu, et al.
Published: (2025)
Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)
by: Kim, Dong Eun, et al.
Published: (2025)
A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration
by: Yunusoglu, Aybars, et al.
Published: (2026)
by: Yunusoglu, Aybars, et al.
Published: (2026)
TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification
by: Zhong, Yang, et al.
Published: (2025)
by: Zhong, Yang, et al.
Published: (2025)
NX-CGRA: A Programmable Hardware Accelerator for Core Transformer Algorithms on Edge Devices
by: Prasad, Rohit
Published: (2025)
by: Prasad, Rohit
Published: (2025)
A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats
by: Cheng, Jianyi, et al.
Published: (2023)
by: Cheng, Jianyi, et al.
Published: (2023)
PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators
by: Wang, Xinyu, et al.
Published: (2024)
by: Wang, Xinyu, et al.
Published: (2024)
TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
by: Nayak, Nandeeka, et al.
Published: (2023)
by: Nayak, Nandeeka, et al.
Published: (2023)
Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing
by: Wang, Chuanzhen, et al.
Published: (2026)
by: Wang, Chuanzhen, et al.
Published: (2026)
Similar Items
-
Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
by: Papalamprou, Ilias, et al.
Published: (2025) -
WideSA: A High Array Utilization Mapping Scheme for Uniform Recurrences on the Versal ACAP Architecture
by: Dai, Tuo, et al.
Published: (2024) -
DPUV4E: High-Throughput DPU Architecture Design for CNN on Versal ACAP
by: Li, Guoyu, et al.
Published: (2025) -
AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP
by: Li, Enlai, et al.
Published: (2026) -
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)