Saved in:
| Main Authors: | Nayak, Nandeeka, Wu, Xinrui, Odemuyiwa, Toluwanimi O., Pellauer, Michael, Emer, Joel S., Fletcher, Christopher W. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.10491 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
by: Nayak, Nandeeka, et al.
Published: (2023)
by: Nayak, Nandeeka, et al.
Published: (2023)
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
by: Odemuyiwa, Toluwanimi O., et al.
Published: (2026)
by: Odemuyiwa, Toluwanimi O., et al.
Published: (2026)
RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
by: Zhu, Yan, et al.
Published: (2026)
by: Zhu, Yan, et al.
Published: (2026)
LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space
by: Gilbert, Michael, et al.
Published: (2024)
by: Gilbert, Michael, et al.
Published: (2024)
Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2024)
by: Andrulis, Tanner, et al.
Published: (2024)
Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2026)
by: Andrulis, Tanner, et al.
Published: (2026)
Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
by: Xue, Zi Yu, et al.
Published: (2023)
by: Xue, Zi Yu, et al.
Published: (2023)
The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
by: Gilbert, Michael, et al.
Published: (2026)
by: Gilbert, Michael, et al.
Published: (2026)
HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility
by: Herbst, Jonathan, et al.
Published: (2025)
by: Herbst, Jonathan, et al.
Published: (2025)
CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool
by: Andrulis, Tanner, et al.
Published: (2024)
by: Andrulis, Tanner, et al.
Published: (2024)
Architecture-Level Modeling of Photonic Deep Neural Network Accelerators
by: Andrulis, Tanner, et al.
Published: (2024)
by: Andrulis, Tanner, et al.
Published: (2024)
Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
by: Symons, Arne, et al.
Published: (2022)
by: Symons, Arne, et al.
Published: (2022)
HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads
by: Garg, Raveesh, et al.
Published: (2025)
by: Garg, Raveesh, et al.
Published: (2025)
FuseFPS: Accelerating Farthest Point Sampling with Fusing KD-tree Construction for Point Clouds
by: Han, Meng, et al.
Published: (2023)
by: Han, Meng, et al.
Published: (2023)
Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)
by: Kim, Dong Eun, et al.
Published: (2025)
CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse
by: Garg, Raveesh, et al.
Published: (2023)
by: Garg, Raveesh, et al.
Published: (2023)
Leveraging Recurrent Patterns in Graph Accelerators
by: Rahimi, Masoud, et al.
Published: (2025)
by: Rahimi, Masoud, et al.
Published: (2025)
Multilayer Dataflow: Orchestrate Butterfly Sparsity to Accelerate Attention Computation
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators
by: Zhang, Chi, et al.
Published: (2026)
by: Zhang, Chi, et al.
Published: (2026)
ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation
by: Han, Dengke, et al.
Published: (2024)
by: Han, Dengke, et al.
Published: (2024)
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Efficient Multi-Head Attention on Tile-Based Many-PE Accelerators
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design
by: Wu, Junyi, et al.
Published: (2025)
by: Wu, Junyi, et al.
Published: (2025)
The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization
by: Li, Meng, et al.
Published: (2026)
by: Li, Meng, et al.
Published: (2026)
Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator Platforms
by: Colleman, Steven, et al.
Published: (2024)
by: Colleman, Steven, et al.
Published: (2024)
Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning
by: Mishty, Kaniz, et al.
Published: (2024)
by: Mishty, Kaniz, et al.
Published: (2024)
Hybrid Photonic-digital Accelerator for Attention Mechanism
by: Li, Huize, et al.
Published: (2025)
by: Li, Huize, et al.
Published: (2025)
SystolicAttention: Fusing FlashAttention within a Single Systolic Array
by: Lin, Jiawei, et al.
Published: (2025)
by: Lin, Jiawei, et al.
Published: (2025)
Late Breaking Results: Leveraging Approximate Computing for Carbon-Aware DNN Accelerators
by: Panteleaki, Aikaterini Maria, et al.
Published: (2025)
by: Panteleaki, Aikaterini Maria, et al.
Published: (2025)
PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization
by: Yang, Simei, et al.
Published: (2025)
by: Yang, Simei, et al.
Published: (2025)
DiffuSE: Cross-Layer Design Space Exploration of DNN Accelerator via Diffusion-Driven Optimization
by: Ren, Yi, et al.
Published: (2025)
by: Ren, Yi, et al.
Published: (2025)
Holistic Optimization Framework for FPGA Accelerators
by: Pouget, Stéphane, et al.
Published: (2025)
by: Pouget, Stéphane, et al.
Published: (2025)
Low-Cost FlashAttention with Fused Exponential and Multiplication Hardware Operators
by: Alexandridis, Kosmas, et al.
Published: (2025)
by: Alexandridis, Kosmas, et al.
Published: (2025)
LLM-DSE: Searching Accelerator Parameters with LLM Agents
by: Wang, Hanyu, et al.
Published: (2025)
by: Wang, Hanyu, et al.
Published: (2025)
Leveraging Application-Specific Knowledge for Energy-Efficient Deep Learning Accelerators on Resource-Constrained FPGAs
by: Qian, Chao
Published: (2025)
by: Qian, Chao
Published: (2025)
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)
by: Li, Huize, et al.
Published: (2026)
FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
by: Jayanth, Rakshith, et al.
Published: (2026)
by: Jayanth, Rakshith, et al.
Published: (2026)
A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Convolutions Predictable Offloading to an Accelerator: Formalization and Optimization
by: Husson, Benjamin, et al.
Published: (2026)
by: Husson, Benjamin, et al.
Published: (2026)
Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators
by: Yik, Jason, et al.
Published: (2025)
by: Yik, Jason, et al.
Published: (2025)
IMAGine: An In-Memory Accelerated GEMV Engine Overlay
by: Kabir, MD Arafat, et al.
Published: (2024)
by: Kabir, MD Arafat, et al.
Published: (2024)
Similar Items
-
TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
by: Nayak, Nandeeka, et al.
Published: (2023) -
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
by: Odemuyiwa, Toluwanimi O., et al.
Published: (2026) -
RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
by: Zhu, Yan, et al.
Published: (2026) -
LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space
by: Gilbert, Michael, et al.
Published: (2024) -
Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2024)