Saved in:
| Main Authors: | Van Essendelft, Dirk, Almolyki, Hayl, Shi, Wei, Jordan, Terry, Wang, Mei-Yu, Saidi, Wissam A. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.16990 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures
by: Van Essendelft, Dirk, et al.
Published: (2025)
by: Van Essendelft, Dirk, et al.
Published: (2025)
Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding
by: Iff, Patrick, et al.
Published: (2026)
by: Iff, Patrick, et al.
Published: (2026)
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)
by: Mhatre, Kaustubh, et al.
Published: (2025)
Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration
by: Feng, Yinxiao, et al.
Published: (2024)
by: Feng, Yinxiao, et al.
Published: (2024)
Theseus: Exploring Efficient Wafer-Scale Chip Design for Large Language Models
by: Zhu, Jingchen, et al.
Published: (2024)
by: Zhu, Jingchen, et al.
Published: (2024)
DarwinWafer: A Wafer-Scale Neuromorphic Chip
by: Zhu, Xiaolei, et al.
Published: (2025)
by: Zhu, Xiaolei, et al.
Published: (2025)
IMAGine: An In-Memory Accelerated GEMV Engine Overlay
by: Kabir, MD Arafat, et al.
Published: (2024)
by: Kabir, MD Arafat, et al.
Published: (2024)
Accelerating CRONet on AMD Versal AIE-ML Engines
by: Mhatre, Kaustubh, et al.
Published: (2026)
by: Mhatre, Kaustubh, et al.
Published: (2026)
FireFly-T: High-Throughput Sparsity Exploitation for Spiking Transformer Acceleration with Dual-Engine Overlay Architecture
by: Li, Tenglong, et al.
Published: (2025)
by: Li, Tenglong, et al.
Published: (2025)
Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
by: Liu, Yiqi, et al.
Published: (2026)
by: Liu, Yiqi, et al.
Published: (2026)
HAVEN: High-Bandwidth Flash Augmented Vector Engine for Large-Scale Approximate Nearest-Neighbor Search Acceleration
by: Hsu, Po-Kai, et al.
Published: (2026)
by: Hsu, Po-Kai, et al.
Published: (2026)
A Comparison of the Cerebras Wafer-Scale Integration Technology with Nvidia GPU-based Systems for Artificial Intelligence
by: Kundu, Yudhishthira, et al.
Published: (2025)
by: Kundu, Yudhishthira, et al.
Published: (2025)
Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures
by: Luo, Shuqing, et al.
Published: (2026)
by: Luo, Shuqing, et al.
Published: (2026)
Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication
by: Ohno, Ayumi, et al.
Published: (2025)
by: Ohno, Ayumi, et al.
Published: (2025)
TYTAN: Taylor-series based Non-Linear Activation Engine for Deep Learning Accelerators
by: Pramanik, Soham, et al.
Published: (2025)
by: Pramanik, Soham, et al.
Published: (2025)
LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators
by: Li, Changhong, et al.
Published: (2025)
by: Li, Changhong, et al.
Published: (2025)
Hierarchical Recording Architecture for Three-Dimensional Magnetic Recording
by: Jian, Yugen, et al.
Published: (2025)
by: Jian, Yugen, et al.
Published: (2025)
GPU-Accelerated Simulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems
by: Gonul, Yilmaz Ege, et al.
Published: (2025)
by: Gonul, Yilmaz Ege, et al.
Published: (2025)
Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators
by: Yik, Jason, et al.
Published: (2025)
by: Yik, Jason, et al.
Published: (2025)
RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator
by: Zhang, Jie, et al.
Published: (2024)
by: Zhang, Jie, et al.
Published: (2024)
HFRWKV: A High-Performance Fully On-Chip Hardware Accelerator for RWKV
by: Shijie, Liu, et al.
Published: (2026)
by: Shijie, Liu, et al.
Published: (2026)
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)
by: Li, Huize, et al.
Published: (2026)
EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology
by: Wu, Qizhe, et al.
Published: (2024)
by: Wu, Qizhe, et al.
Published: (2024)
High-Performance Pipelined NTT Accelerators with Homogeneous Digit-Serial Modulo Arithmetic
by: Alexakis, George, et al.
Published: (2025)
by: Alexakis, George, et al.
Published: (2025)
An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators
by: Qararyah, Fareed, et al.
Published: (2025)
by: Qararyah, Fareed, et al.
Published: (2025)
DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators
by: Yi, Xiaoling, et al.
Published: (2025)
by: Yi, Xiaoling, et al.
Published: (2025)
TEMP: A Memory Efficient Physical-aware Tensor Partition-Mapping Framework on Wafer-scale Chips
by: Wang, Huizheng, et al.
Published: (2025)
by: Wang, Huizheng, et al.
Published: (2025)
Hardware Acceleration of Kolmogorov-Arnold Network (KAN) in Large-Scale Systems
by: Huang, Wei-Hsing, et al.
Published: (2025)
by: Huang, Wei-Hsing, et al.
Published: (2025)
RAS: A Bit-Exact rANS Accelerator For High-Performance Neural Lossless Compression
by: Qin, Yuchao, et al.
Published: (2025)
by: Qin, Yuchao, et al.
Published: (2025)
GSIM: Accelerating RTL Simulation for Large-Scale Designs
by: Chen, Lu, et al.
Published: (2025)
by: Chen, Lu, et al.
Published: (2025)
Enthuse: Efficient Adaptable High-throughput Streaming Aggregation Engines
by: Papaphilippou, Philippos, et al.
Published: (2024)
by: Papaphilippou, Philippos, et al.
Published: (2024)
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
by: Rashidi, Saeed, et al.
Published: (2024)
by: Rashidi, Saeed, et al.
Published: (2024)
High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
by: Lin, Kuan-Ting, et al.
Published: (2025)
by: Lin, Kuan-Ting, et al.
Published: (2025)
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs
by: Wu, Qizhe, et al.
Published: (2025)
by: Wu, Qizhe, et al.
Published: (2025)
GAP-LA: GPU-Accelerated Performance-Driven Layer Assignment
by: Zhao, Chunyuan, et al.
Published: (2025)
by: Zhao, Chunyuan, et al.
Published: (2025)
Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
by: Symons, Arne, et al.
Published: (2022)
by: Symons, Arne, et al.
Published: (2022)
Aging Aware Adaptive Voltage Scaling for Reliable and Efficient AI Accelerators
by: Xie, Tong, et al.
Published: (2026)
by: Xie, Tong, et al.
Published: (2026)
VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity Support
by: Ou, Wenhui, et al.
Published: (2026)
by: Ou, Wenhui, et al.
Published: (2026)
Changing the Game: The Bounce-Bind Ising Machine
by: Zhang, Haiyang, et al.
Published: (2026)
by: Zhang, Haiyang, et al.
Published: (2026)
Efficient Open Modification Spectral Library Searching in High-Dimensional Space with Multi-Level-Cell Memory
by: Fan, Keming, et al.
Published: (2024)
by: Fan, Keming, et al.
Published: (2024)
Similar Items
-
A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures
by: Van Essendelft, Dirk, et al.
Published: (2025) -
Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding
by: Iff, Patrick, et al.
Published: (2026) -
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025) -
Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration
by: Feng, Yinxiao, et al.
Published: (2024) -
Theseus: Exploring Efficient Wafer-Scale Chip Design for Large Language Models
by: Zhu, Jingchen, et al.
Published: (2024)