:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nayak, Nandeeka, Wu, Xinrui, Odemuyiwa, Toluwanimi O., Pellauer, Michael, Emer, Joel S., Fletcher, Christopher W.
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2406.10491
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
by: Nayak, Nandeeka, et al.
Published: (2023)

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
by: Odemuyiwa, Toluwanimi O., et al.
Published: (2026)

RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
by: Zhu, Yan, et al.
Published: (2026)

LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space
by: Gilbert, Michael, et al.
Published: (2024)

Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2024)

Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2026)

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
by: Xue, Zi Yu, et al.
Published: (2023)

The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
by: Gilbert, Michael, et al.
Published: (2026)

HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility
by: Herbst, Jonathan, et al.
Published: (2025)

CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool
by: Andrulis, Tanner, et al.
Published: (2024)

Architecture-Level Modeling of Photonic Deep Neural Network Accelerators
by: Andrulis, Tanner, et al.
Published: (2024)

Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
by: Symons, Arne, et al.
Published: (2022)

HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads
by: Garg, Raveesh, et al.
Published: (2025)

FuseFPS: Accelerating Farthest Point Sampling with Fusing KD-tree Construction for Point Clouds
by: Han, Meng, et al.
Published: (2023)

Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)

CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse
by: Garg, Raveesh, et al.
Published: (2023)

Leveraging Recurrent Patterns in Graph Accelerators
by: Rahimi, Masoud, et al.
Published: (2025)

Multilayer Dataflow: Orchestrate Butterfly Sparsity to Accelerate Attention Computation
by: Wu, Haibin, et al.
Published: (2024)

FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators
by: Zhang, Chi, et al.
Published: (2026)

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation
by: Han, Dengke, et al.
Published: (2024)

FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Efficient Multi-Head Attention on Tile-Based Many-PE Accelerators
by: Zhang, Chi, et al.
Published: (2025)

SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design
by: Wu, Junyi, et al.
Published: (2025)

The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization
by: Li, Meng, et al.
Published: (2026)

Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator Platforms
by: Colleman, Steven, et al.
Published: (2024)

Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning
by: Mishty, Kaniz, et al.
Published: (2024)

Hybrid Photonic-digital Accelerator for Attention Mechanism
by: Li, Huize, et al.
Published: (2025)

SystolicAttention: Fusing FlashAttention within a Single Systolic Array
by: Lin, Jiawei, et al.
Published: (2025)

Late Breaking Results: Leveraging Approximate Computing for Carbon-Aware DNN Accelerators
by: Panteleaki, Aikaterini Maria, et al.
Published: (2025)

PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization
by: Yang, Simei, et al.
Published: (2025)

DiffuSE: Cross-Layer Design Space Exploration of DNN Accelerator via Diffusion-Driven Optimization
by: Ren, Yi, et al.
Published: (2025)

Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025)

Low-Cost FlashAttention with Fused Exponential and Multiplication Hardware Operators
by: Alexandridis, Kosmas, et al.
Published: (2025)

LLM-DSE: Searching Accelerator Parameters with LLM Agents
by: Wang, Hanyu, et al.
Published: (2025)

Leveraging Application-Specific Knowledge for Energy-Efficient Deep Learning Accelerators on Resource-Constrained FPGAs
by: Qian, Chao
Published: (2025)

Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)

FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
by: Jayanth, Rakshith, et al.
Published: (2026)

A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization
by: Zhang, Yifan, et al.
Published: (2025)

Convolutions Predictable Offloading to an Accelerator: Formalization and Optimization
by: Husson, Benjamin, et al.
Published: (2026)

Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators
by: Yik, Jason, et al.
Published: (2025)

IMAGine: An In-Memory Accelerated GEMV Engine Overlay
by: Kabir, MD Arafat, et al.
Published: (2024)