:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Kuilian, Zhang, Li, Eltawil, Ahmed M., Salama, Khaled Nabil
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2601.02613
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Efficient IMC Accelerator Design Through Joint Hardware-Workload Co-optimization
by: Krestinskaya, Olga, et al.
Published: (2024)

Stream-HLS: Towards Automatic Dataflow Acceleration
by: Basalama, Suhail, et al.
Published: (2025)

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator
by: Yu, Zhewen, et al.
Published: (2024)

Multilayer Dataflow: Orchestrate Butterfly Sparsity to Accelerate Attention Computation
by: Wu, Haibin, et al.
Published: (2024)

CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search
by: Krestinskaya, Olga, et al.
Published: (2025)

StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs
by: Ye, Hanchen, et al.
Published: (2025)

Joint Hardware-Workload Co-Optimization for In-Memory Computing Accelerators
by: Krestinskaya, Olga, et al.
Published: (2026)

A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization
by: Zhang, Yifan, et al.
Published: (2025)

FlexNeRFer: A Multi-Dataflow, Adaptive Sparsity-Aware Accelerator for On-Device NeRF Rendering
by: Noh, Seock-Hwan, et al.
Published: (2025)

Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
by: Symons, Arne, et al.
Published: (2022)

StreamDCIM: A Tile-based Streaming Digital CIM Accelerator with Mixed-stationary Cross-forwarding Dataflow for Multimodal Transformer
by: Qin, Shantian, et al.
Published: (2025)

Implementing and Optimizing the Scaled Dot-Product Attention on Streaming Dataflow
by: Sohn, Gina, et al.
Published: (2024)

DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration
by: Chen, Xingzhen, et al.
Published: (2026)

Exploring the Sparsity-Quantization Interplay on a Novel Hybrid SNN Event-Driven Architecture
by: Aliyev, Ilkin, et al.
Published: (2024)

DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators
by: Yi, Xiaoling, et al.
Published: (2025)

AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator
by: Xue, Chenhao, et al.
Published: (2026)

A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow
by: Zhao, Zhiyuan, et al.
Published: (2024)

MIREDO: MIP-Driven Resource-Efficient Dataflow Optimization for Computing-in-Memory Accelerator
by: He, Xiaolin, et al.
Published: (2025)

Surrogates, Spikes, and Sparsity: Performance Analysis and Characterization of SNN Hyperparameters on Hardware
by: Aliyev, Ilkin, et al.
Published: (2026)

STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-Design
by: Wang, Kainan, et al.
Published: (2025)

LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control Flow
by: Chang, Kaiyan, et al.
Published: (2025)

BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
by: Rakka, Mariam, et al.
Published: (2024)

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow
by: Hsiung, Ching-Lin, et al.
Published: (2025)

LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space
by: Gilbert, Michael, et al.
Published: (2024)

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
by: Li, Jonathan, et al.
Published: (2025)

Prosperity: Accelerating Spiking Neural Networks via Product Sparsity
by: Wei, Chiyue, et al.
Published: (2025)

EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models
by: Bazzi, Jinane, et al.
Published: (2026)

DAS-MP: Enabling High-Quality Macro Placement with Enhanced Dataflow Awareness
by: Zhao, Xiaotian, et al.
Published: (2025)

CODO: An Automated Compiler for Comprehensive Dataflow Optimization
by: Zhang, Weichuang, et al.
Published: (2026)

SIRA: Scaled-Integer Range Analysis for Optimizing FPGA Dataflow Neural Network Accelerators
by: Umuroglu, Yaman, et al.
Published: (2025)

VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator
by: Wang, Zhican, et al.
Published: (2025)

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling
by: Wang, Huizheng, et al.
Published: (2024)

VESTA: A Versatile SNN-Based Transformer Accelerator with Unified PEs for Multiple Computational Layers
by: Chen, Ching-Yao, et al.
Published: (2025)

FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
by: Tong, Jianming, et al.
Published: (2024)

FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators
by: Zhang, Chi, et al.
Published: (2026)

FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Efficient Multi-Head Attention on Tile-Based Many-PE Accelerators
by: Zhang, Chi, et al.
Published: (2025)

Accelerating Recommender Model ETL with a Streaming FPGA-GPU Dataflow
by: Zhu, Yu, et al.
Published: (2025)

SATA: Sparsity-Aware Scheduling for Selective Token Attention
by: Fan, Zhenkun, et al.
Published: (2026)

Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference Accelerators
by: Weerasena, Hansika, et al.
Published: (2023)

Salca: A Sparsity-Aware Hardware Accelerator for Efficient Long-Context Attention Decoding
by: Fan, Wang, et al.
Published: (2026)