:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Nayak, Nandeeka, Odemuyiwa, Toluwanimi O., Ugare, Shubham, Fletcher, Christopher W., Pellauer, Michael, Emer, Joel S.
Formato:	Preprint
Publicado:	2023
Materias:	Hardware Architecture
Acceso en línea:	https://arxiv.org/abs/2304.07931
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design
por: Nayak, Nandeeka, et al.
Publicado: (2024)

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
por: Odemuyiwa, Toluwanimi O., et al.
Publicado: (2026)

RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
por: Zhu, Yan, et al.
Publicado: (2026)

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
por: Xue, Zi Yu, et al.
Publicado: (2023)

HaShiFlex: A High-Throughput Hardened Shifter DNN Accelerator with Fine-Tuning Flexibility
por: Herbst, Jonathan, et al.
Publicado: (2025)

LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space
por: Gilbert, Michael, et al.
Publicado: (2024)

Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
por: Andrulis, Tanner, et al.
Publicado: (2024)

Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
por: Andrulis, Tanner, et al.
Publicado: (2026)

CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool
por: Andrulis, Tanner, et al.
Publicado: (2024)

The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
por: Gilbert, Michael, et al.
Publicado: (2026)

CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse
por: Garg, Raveesh, et al.
Publicado: (2023)

Architecture-Level Modeling of Photonic Deep Neural Network Accelerators
por: Andrulis, Tanner, et al.
Publicado: (2024)

Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
por: Taka, Endri, et al.
Publicado: (2025)

HARP: A Taxonomy for Heterogeneous and Hierarchical Processors for Mixed-reuse Workloads
por: Garg, Raveesh, et al.
Publicado: (2025)

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization
por: Wu, Ka Wai
Publicado: (2024)

StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs
por: Ye, Hanchen, et al.
Publicado: (2025)

FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction
por: Kulp, Gabriel, et al.
Publicado: (2024)

Error Checking for Sparse Systolic Tensor Arrays
por: Peltekis, Christodoulos, et al.
Publicado: (2024)

Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
por: Wijeratne, Sasindu, et al.
Publicado: (2024)

Open-source Stand-Alone Versatile Tensor Accelerator
por: Faure-Gignoux, Anthony, et al.
Publicado: (2025)

ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics
por: Gao, Ruijie, et al.
Publicado: (2026)

AME-PIM: Can Memory be Your Next Tensor Accelerator?
por: Venieri, Emanuele, et al.
Publicado: (2026)

HCiM: ADC-Less Hybrid Analog-Digital Compute in Memory Accelerator for Deep Learning Workloads
por: Negi, Shubham, et al.
Publicado: (2024)

Holistic Optimization Framework for FPGA Accelerators
por: Pouget, Stéphane, et al.
Publicado: (2025)

FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
por: Lu, Jinming, et al.
Publicado: (2025)

FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators
por: Jia, Shuao, et al.
Publicado: (2025)

An Efficient Sparse Hardware Accelerator for Spike-Driven Transformer
por: Li, Zhengke, et al.
Publicado: (2025)

Systolic Array Acceleration of Diagonal-Optimized Sparse-Sparse Matrix Multiplication for Efficient Quantum Simulation
por: Su, Yuchao, et al.
Publicado: (2025)

SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design
por: Wu, Junyi, et al.
Publicado: (2025)

Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
por: Yoon, Hyunsung, et al.
Publicado: (2026)

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
por: Jeong, Geonhwa, et al.
Publicado: (2024)

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse
por: Ai, Chenyang, et al.
Publicado: (2024)

A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator
por: Huang, Sixiao, et al.
Publicado: (2025)

COFFEE: A Carbon-Modeling and Optimization Framework for HZO-based FeFET eNVMs
por: Wu, Hongbang, et al.
Publicado: (2026)

SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations
por: Parthasarathy, Rishab
Publicado: (2025)

HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
por: Negi, Shubham, et al.
Publicado: (2025)

FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
por: Jayanth, Rakshith, et al.
Publicado: (2026)

A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse
por: Hsu, Kai-Chieh, et al.
Publicado: (2025)

GUST: Graph Edge-Coloring Utilization for Accelerating Sparse Matrix Vector Multiplication
por: Gerami, Armin, et al.
Publicado: (2024)

Periodic Online Testing for Sparse Systolic Tensor Arrays
por: Peltekis, Christodoulos, et al.
Publicado: (2025)