:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Van Essendelft, Dirk, Almolyki, Hayl, Shi, Wei, Jordan, Terry, Wang, Mei-Yu, Saidi, Wissam A.
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture Materials Science
Online Access:	https://arxiv.org/abs/2404.16990
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures
by: Van Essendelft, Dirk, et al.
Published: (2025)

Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding
by: Iff, Patrick, et al.
Published: (2026)

GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)

Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration
by: Feng, Yinxiao, et al.
Published: (2024)

Theseus: Exploring Efficient Wafer-Scale Chip Design for Large Language Models
by: Zhu, Jingchen, et al.
Published: (2024)

DarwinWafer: A Wafer-Scale Neuromorphic Chip
by: Zhu, Xiaolei, et al.
Published: (2025)

IMAGine: An In-Memory Accelerated GEMV Engine Overlay
by: Kabir, MD Arafat, et al.
Published: (2024)

Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)

FireFly-T: High-Throughput Sparsity Exploitation for Spiking Transformer Acceleration with Dual-Engine Overlay Architecture
by: Li, Tenglong, et al.
Published: (2025)

Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
by: Liu, Yiqi, et al.
Published: (2026)

HAVEN: High-Bandwidth Flash Augmented Vector Engine for Large-Scale Approximate Nearest-Neighbor Search Acceleration
by: Hsu, Po-Kai, et al.
Published: (2026)

A Comparison of the Cerebras Wafer-Scale Integration Technology with Nvidia GPU-based Systems for Artificial Intelligence
by: Kundu, Yudhishthira, et al.
Published: (2025)

Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
by: Luo, Shuqing, et al.
Published: (2026)

Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication
by: Ohno, Ayumi, et al.
Published: (2025)

TYTAN: Taylor-series based Non-Linear Activation Engine for Deep Learning Accelerators
by: Pramanik, Soham, et al.
Published: (2025)

LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators
by: Li, Changhong, et al.
Published: (2025)

Hierarchical Recording Architecture for Three-Dimensional Magnetic Recording
by: Jian, Yugen, et al.
Published: (2025)

GPU-Accelerated Simulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems
by: Gonul, Yilmaz Ege, et al.
Published: (2025)

Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators
by: Yik, Jason, et al.
Published: (2025)

RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator
by: Zhang, Jie, et al.
Published: (2024)

HFRWKV: A High-Performance Fully On-Chip Hardware Accelerator for RWKV
by: Shijie, Liu, et al.
Published: (2026)

Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)

EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology
by: Wu, Qizhe, et al.
Published: (2024)

High-Performance Pipelined NTT Accelerators with Homogeneous Digit-Serial Modulo Arithmetic
by: Alexakis, George, et al.
Published: (2025)

An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators
by: Qararyah, Fareed, et al.
Published: (2025)

DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators
by: Yi, Xiaoling, et al.
Published: (2025)

TEMP: A Memory Efficient Physical-aware Tensor Partition-Mapping Framework on Wafer-scale Chips
by: Wang, Huizheng, et al.
Published: (2025)

Hardware Acceleration of Kolmogorov-Arnold Network (KAN) in Large-Scale Systems
by: Huang, Wei-Hsing, et al.
Published: (2025)

RAS: A Bit-Exact rANS Accelerator For High-Performance Neural Lossless Compression
by: Qin, Yuchao, et al.
Published: (2025)

GSIM: Accelerating RTL Simulation for Large-Scale Designs
by: Chen, Lu, et al.
Published: (2025)

Enthuse: Efficient Adaptable High-throughput Streaming Aggregation Engines
by: Papaphilippou, Philippos, et al.
Published: (2024)

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
by: Rashidi, Saeed, et al.
Published: (2024)

High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
by: Lin, Kuan-Ting, et al.
Published: (2025)

Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs
by: Wu, Qizhe, et al.
Published: (2025)

GAP-LA: GPU-Accelerated Performance-Driven Layer Assignment
by: Zhao, Chunyuan, et al.
Published: (2025)

Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
by: Symons, Arne, et al.
Published: (2022)

Aging Aware Adaptive Voltage Scaling for Reliable and Efficient AI Accelerators
by: Xie, Tong, et al.
Published: (2026)

VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity Support
by: Ou, Wenhui, et al.
Published: (2026)

Changing the Game: The Bounce-Bind Ising Machine
by: Zhang, Haiyang, et al.
Published: (2026)

Efficient Open Modification Spectral Library Searching in High-Dimensional Space with Multi-Level-Cell Memory
by: Fan, Keming, et al.
Published: (2024)