Saved in:
| Main Authors: | Dai, Tuo, Shi, Bizhao, Luo, Guojie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.16792 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
by: Papalamprou, Ilias, et al.
Published: (2025)
by: Papalamprou, Ilias, et al.
Published: (2025)
CAT: Customized Transformer Accelerator Framework on Versal ACAP
by: Zhang, Wenbo, et al.
Published: (2024)
by: Zhang, Wenbo, et al.
Published: (2024)
DPUV4E: High-Throughput DPU Architecture Design for CNN on Versal ACAP
by: Li, Guoyu, et al.
Published: (2025)
by: Li, Guoyu, et al.
Published: (2025)
AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP
by: Li, Enlai, et al.
Published: (2026)
by: Li, Enlai, et al.
Published: (2026)
AMD Versal Implementations of FAM and SSCA Estimators
by: Li, Carol Jingyi, et al.
Published: (2025)
by: Li, Carol Jingyi, et al.
Published: (2025)
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)
by: Mhatre, Kaustubh, et al.
Published: (2026)
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)
by: Mhatre, Kaustubh, et al.
Published: (2025)
Exploring the Versal AI Engine for 3D Gaussian Splatting
by: Shimamura, Kotaro, et al.
Published: (2025)
by: Shimamura, Kotaro, et al.
Published: (2025)
Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication
by: Ohno, Ayumi, et al.
Published: (2025)
by: Ohno, Ayumi, et al.
Published: (2025)
Floorplanning with I/O assignment via feasibility-seeking and superiorization methods
by: Yu, Shan, et al.
Published: (2024)
by: Yu, Shan, et al.
Published: (2024)
Wit-HW: Bug Localization in Hardware Design Code via Witness Test Case Generation
by: Ma, Ruiyang, et al.
Published: (2025)
by: Ma, Ruiyang, et al.
Published: (2025)
Enabling Mixed criticality applications for the Versal AI-Engines
by: Sprave, Vincent, et al.
Published: (2026)
by: Sprave, Vincent, et al.
Published: (2026)
ENFOR-SA: End-to-end Cross-layer Transient Fault Injector for Efficient and Accurate DNN Reliability Assessment on Systolic Arrays
by: Tonetto, Rafael Billig, et al.
Published: (2026)
by: Tonetto, Rafael Billig, et al.
Published: (2026)
A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations
by: Petropoulos, Anastasios, et al.
Published: (2025)
by: Petropoulos, Anastasios, et al.
Published: (2025)
APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption
by: Ding, Lin, et al.
Published: (2024)
by: Ding, Lin, et al.
Published: (2024)
Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs
by: Walter, Dominik, et al.
Published: (2025)
by: Walter, Dominik, et al.
Published: (2025)
Field-Programmable Gate Array Architecture for Deep Learning: Survey & Future Directions
by: Boutros, Andrew, et al.
Published: (2024)
by: Boutros, Andrew, et al.
Published: (2024)
Mapping code on Coarse Grained Reconfigurable Arrays using a SAT solver
by: Tirelli, Cristian, et al.
Published: (2025)
by: Tirelli, Cristian, et al.
Published: (2025)
High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
by: Lin, Kuan-Ting, et al.
Published: (2025)
by: Lin, Kuan-Ting, et al.
Published: (2025)
Optimized Spatial Architecture Mapping Flow for Transformer Accelerators
by: Xu, Haocheng, et al.
Published: (2024)
by: Xu, Haocheng, et al.
Published: (2024)
A Novel Cost-Effective MIMO Architecture with Ray Antenna Array for Enhanced Wireless Communication Performance
by: Dong, Zhenjun, et al.
Published: (2025)
by: Dong, Zhenjun, et al.
Published: (2025)
ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic Array
by: Han, Meng, et al.
Published: (2023)
by: Han, Meng, et al.
Published: (2023)
ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference
by: Sun, Ruiqi, et al.
Published: (2024)
by: Sun, Ruiqi, et al.
Published: (2024)
SA-Kura: An Energy-Efficient Systolic Array Accelerator for Locally-Coupled Kuramoto Drift in Diffusion Sampling
by: Jin, Jeongmin, et al.
Published: (2026)
by: Jin, Jeongmin, et al.
Published: (2026)
A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-core RV Clusters for Attention-Based Model Deployment
by: Wang, Bowen, et al.
Published: (2025)
by: Wang, Bowen, et al.
Published: (2025)
SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures
by: Tirelli, Cristian, et al.
Published: (2025)
by: Tirelli, Cristian, et al.
Published: (2025)
Bridging the Gap between Hardware Fuzzing and Industrial Verification
by: Ma, Ruiyang, et al.
Published: (2025)
by: Ma, Ruiyang, et al.
Published: (2025)
HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality
by: Shin, Hery, et al.
Published: (2024)
by: Shin, Hery, et al.
Published: (2024)
Strassen Multisystolic Array Hardware Architectures
by: Pogue, Trevor E., et al.
Published: (2025)
by: Pogue, Trevor E., et al.
Published: (2025)
Design of a Reformed Array Logic Binary Multiplier for High-Speed Computations
by: Mohammad, Sakib, et al.
Published: (2024)
by: Mohammad, Sakib, et al.
Published: (2024)
FlexMem: High-Parallel Near-Memory Architecture for Flexible Dataflow in Fully Homomorphic Encryption
by: Shi, Shangyi, et al.
Published: (2025)
by: Shi, Shangyi, et al.
Published: (2025)
SA-DS: A Dataset for Large Language Model-Driven AI Accelerator Design Generation
by: Vungarala, Deepak, et al.
Published: (2024)
by: Vungarala, Deepak, et al.
Published: (2024)
Hermes: A Unified High-Performance NTT Architecture with Hybrid Dataflow
by: Gu, Hang, et al.
Published: (2026)
by: Gu, Hang, et al.
Published: (2026)
Architectural Classification of XR Workloads: Cross-Layer Archetypes and Implications
by: Shi, Xinyu, et al.
Published: (2026)
by: Shi, Xinyu, et al.
Published: (2026)
Digit-Recurrence Posit Division
by: Murillo, Raul, et al.
Published: (2025)
by: Murillo, Raul, et al.
Published: (2025)
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
by: Luo, Weile, et al.
Published: (2024)
by: Luo, Weile, et al.
Published: (2024)
CVA6S+: A Superscalar RISC-V Core with High-Throughput Memory Architecture
by: Tedeschi, Riccardo, et al.
Published: (2025)
by: Tedeschi, Riccardo, et al.
Published: (2025)
Leveraging Recurrent Patterns in Graph Accelerators
by: Rahimi, Masoud, et al.
Published: (2025)
by: Rahimi, Masoud, et al.
Published: (2025)
Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain Usage
by: Pun, Junius, et al.
Published: (2025)
by: Pun, Junius, et al.
Published: (2025)
Configurable Multi-Port Memory Architecture for High-Speed Data Communication
by: Dhakad, Narendra Singh, et al.
Published: (2024)
by: Dhakad, Narendra Singh, et al.
Published: (2024)
Similar Items
-
Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
by: Papalamprou, Ilias, et al.
Published: (2025) -
CAT: Customized Transformer Accelerator Framework on Versal ACAP
by: Zhang, Wenbo, et al.
Published: (2024) -
DPUV4E: High-Throughput DPU Architecture Design for CNN on Versal ACAP
by: Li, Guoyu, et al.
Published: (2025) -
AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP
by: Li, Enlai, et al.
Published: (2026) -
AMD Versal Implementations of FAM and SSCA Estimators
by: Li, Carol Jingyi, et al.
Published: (2025)