Guardado en:
| Autores principales: | Nayak, Nandeeka, Odemuyiwa, Toluwanimi O., Ugare, Shubham, Fletcher, Christopher W., Pellauer, Michael, Emer, Joel S. |
|---|---|
| Formato: | Preprint |
| Publicado: |
2023
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2304.07931 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design
por: Nayak, Nandeeka, et al.
Publicado: (2024)
por: Nayak, Nandeeka, et al.
Publicado: (2024)
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
por: Odemuyiwa, Toluwanimi O., et al.
Publicado: (2026)
por: Odemuyiwa, Toluwanimi O., et al.
Publicado: (2026)
RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
por: Zhu, Yan, et al.
Publicado: (2026)
por: Zhu, Yan, et al.
Publicado: (2026)
Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
por: Xue, Zi Yu, et al.
Publicado: (2023)
por: Xue, Zi Yu, et al.
Publicado: (2023)
HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility
por: Herbst, Jonathan, et al.
Publicado: (2025)
por: Herbst, Jonathan, et al.
Publicado: (2025)
LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space
por: Gilbert, Michael, et al.
Publicado: (2024)
por: Gilbert, Michael, et al.
Publicado: (2024)
Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
por: Andrulis, Tanner, et al.
Publicado: (2024)
por: Andrulis, Tanner, et al.
Publicado: (2024)
Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
por: Andrulis, Tanner, et al.
Publicado: (2026)
por: Andrulis, Tanner, et al.
Publicado: (2026)
CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool
por: Andrulis, Tanner, et al.
Publicado: (2024)
por: Andrulis, Tanner, et al.
Publicado: (2024)
The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
por: Gilbert, Michael, et al.
Publicado: (2026)
por: Gilbert, Michael, et al.
Publicado: (2026)
CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse
por: Garg, Raveesh, et al.
Publicado: (2023)
por: Garg, Raveesh, et al.
Publicado: (2023)
Architecture-Level Modeling of Photonic Deep Neural Network Accelerators
por: Andrulis, Tanner, et al.
Publicado: (2024)
por: Andrulis, Tanner, et al.
Publicado: (2024)
Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
por: Taka, Endri, et al.
Publicado: (2025)
por: Taka, Endri, et al.
Publicado: (2025)
HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads
por: Garg, Raveesh, et al.
Publicado: (2025)
por: Garg, Raveesh, et al.
Publicado: (2025)
Accelerating Sparse Graph Neural Networks with Tensor Core Optimization
por: Wu, Ka Wai
Publicado: (2024)
por: Wu, Ka Wai
Publicado: (2024)
StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs
por: Ye, Hanchen, et al.
Publicado: (2025)
por: Ye, Hanchen, et al.
Publicado: (2025)
FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction
por: Kulp, Gabriel, et al.
Publicado: (2024)
por: Kulp, Gabriel, et al.
Publicado: (2024)
Error Checking for Sparse Systolic Tensor Arrays
por: Peltekis, Christodoulos, et al.
Publicado: (2024)
por: Peltekis, Christodoulos, et al.
Publicado: (2024)
Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
por: Wijeratne, Sasindu, et al.
Publicado: (2024)
por: Wijeratne, Sasindu, et al.
Publicado: (2024)
Open-source Stand-Alone Versatile Tensor Accelerator
por: Faure-Gignoux, Anthony, et al.
Publicado: (2025)
por: Faure-Gignoux, Anthony, et al.
Publicado: (2025)
ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics
por: Gao, Ruijie, et al.
Publicado: (2026)
por: Gao, Ruijie, et al.
Publicado: (2026)
AME-PIM: Can Memory be Your Next Tensor Accelerator?
por: Venieri, Emanuele, et al.
Publicado: (2026)
por: Venieri, Emanuele, et al.
Publicado: (2026)
HCiM: ADC-Less Hybrid Analog-Digital Compute in Memory Accelerator for Deep Learning Workloads
por: Negi, Shubham, et al.
Publicado: (2024)
por: Negi, Shubham, et al.
Publicado: (2024)
Holistic Optimization Framework for FPGA Accelerators
por: Pouget, Stéphane, et al.
Publicado: (2025)
por: Pouget, Stéphane, et al.
Publicado: (2025)
FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
por: Lu, Jinming, et al.
Publicado: (2025)
por: Lu, Jinming, et al.
Publicado: (2025)
FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators
por: Jia, Shuao, et al.
Publicado: (2025)
por: Jia, Shuao, et al.
Publicado: (2025)
An Efficient Sparse Hardware Accelerator for Spike-Driven Transformer
por: Li, Zhengke, et al.
Publicado: (2025)
por: Li, Zhengke, et al.
Publicado: (2025)
Systolic Array Acceleration of Diagonal-Optimized Sparse-Sparse Matrix Multiplication for Efficient Quantum Simulation
por: Su, Yuchao, et al.
Publicado: (2025)
por: Su, Yuchao, et al.
Publicado: (2025)
SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design
por: Wu, Junyi, et al.
Publicado: (2025)
por: Wu, Junyi, et al.
Publicado: (2025)
Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
por: Yoon, Hyunsung, et al.
Publicado: (2026)
por: Yoon, Hyunsung, et al.
Publicado: (2026)
Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
por: Jeong, Geonhwa, et al.
Publicado: (2024)
por: Jeong, Geonhwa, et al.
Publicado: (2024)
GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse
por: Ai, Chenyang, et al.
Publicado: (2024)
por: Ai, Chenyang, et al.
Publicado: (2024)
A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator
por: Huang, Sixiao, et al.
Publicado: (2025)
por: Huang, Sixiao, et al.
Publicado: (2025)
COFFEE: A Carbon-Modeling and Optimization Framework for HZO-based FeFET eNVMs
por: Wu, Hongbang, et al.
Publicado: (2026)
por: Wu, Hongbang, et al.
Publicado: (2026)
SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations
por: Parthasarathy, Rishab
Publicado: (2025)
por: Parthasarathy, Rishab
Publicado: (2025)
HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
por: Negi, Shubham, et al.
Publicado: (2025)
por: Negi, Shubham, et al.
Publicado: (2025)
FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
por: Jayanth, Rakshith, et al.
Publicado: (2026)
por: Jayanth, Rakshith, et al.
Publicado: (2026)
A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse
por: Hsu, Kai-Chieh, et al.
Publicado: (2025)
por: Hsu, Kai-Chieh, et al.
Publicado: (2025)
GUST: Graph Edge-Coloring Utilization for Accelerating Sparse Matrix Vector Multiplication
por: Gerami, Armin, et al.
Publicado: (2024)
por: Gerami, Armin, et al.
Publicado: (2024)
Periodic Online Testing for Sparse Systolic Tensor Arrays
por: Peltekis, Christodoulos, et al.
Publicado: (2025)
por: Peltekis, Christodoulos, et al.
Publicado: (2025)
Ejemplares similares
-
FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design
por: Nayak, Nandeeka, et al.
Publicado: (2024) -
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
por: Odemuyiwa, Toluwanimi O., et al.
Publicado: (2026) -
RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
por: Zhu, Yan, et al.
Publicado: (2026) -
Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
por: Xue, Zi Yu, et al.
Publicado: (2023) -
HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility
por: Herbst, Jonathan, et al.
Publicado: (2025)