:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Wulve, Zou, Hailong, Zhou, Rui, Zhang, Jionghao, Li, Qiang, Li, Gang, Zhan, Yi, Qiao, Shushan
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2603.07962
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Optimized Spatial Architecture Mapping Flow for Transformer Accelerators
by: Xu, Haocheng, et al.
Published: (2024)

Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads
by: Li, Boyu, et al.
Published: (2025)

FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture
by: Li, Tenglong, et al.
Published: (2024)

SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models
by: Xia, Yuhuan, et al.
Published: (2026)

DEFA: Efficient Deformable Attention Acceleration via Pruning-Assisted Grid-Sampling and Multi-Scale Parallel Processing
by: Xu, Yansong, et al.
Published: (2024)

METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators
by: Wang, Zhao, et al.
Published: (2021)

Hummingbird: A Smaller and Faster Large Language Model Accelerator on Embedded FPGA
by: Li, Jindong, et al.
Published: (2025)

Prosperity: Accelerating Spiking Neural Networks via Product Sparsity
by: Wei, Chiyue, et al.
Published: (2025)

DiffuSE: Cross-Layer Design Space Exploration of DNN Accelerator via Diffusion-Driven Optimization
by: Ren, Yi, et al.
Published: (2025)

Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM Filtering
by: Shekar, Akhil, et al.
Published: (2025)

A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow
by: Zhao, Zhiyuan, et al.
Published: (2024)

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling
by: Wang, Huizheng, et al.
Published: (2024)

Fast-OverlaPIM: A Fast Overlap-driven Mapping Framework for Processing In-Memory Neural Network Acceleration
by: Wang, Xuan, et al.
Published: (2024)

An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators
by: Qararyah, Fareed, et al.
Published: (2025)

Swift: A Multi-FPGA Framework for Scaling Up Accelerated Graph Analytics
by: Jaiyeoba, Oluwole, et al.
Published: (2024)

A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators
by: Li, Cong, et al.
Published: (2026)

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation
by: Xue, Runzhen, et al.
Published: (2023)

FireFly-P: FPGA-Accelerated Spiking Neural Network Plasticity for Robust Adaptive Control
by: Li, Tenglong, et al.
Published: (2026)

A Bit Level Weight Reordering Strategy Based on Column Similarity to Explore Weight Sparsity in RRAM-based NN Accelerator
by: Yang, Weiping, et al.
Published: (2025)

FireFly-T: High-Throughput Sparsity Exploitation for Spiking Transformer Acceleration with Dual-Engine Overlay Architecture
by: Li, Tenglong, et al.
Published: (2025)

Monad: Towards Cost-effective Specialization for Chiplet-based Spatial Accelerators
by: Hao, Xiaochen, et al.
Published: (2023)

SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations
by: Parthasarathy, Rishab
Published: (2025)

VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator
by: Wang, Zhican, et al.
Published: (2025)

StreamDCIM: A Tile-based Streaming Digital CIM Accelerator with Mixed-stationary Cross-forwarding Dataflow for Multimodal Transformer
by: Qin, Shantian, et al.
Published: (2025)

MemIntelli: A Generic End-to-End Simulation Framework for Memristive Intelligent Computing
by: Zhou, Houji, et al.
Published: (2025)

Robust Qubit Mapping Algorithm via Double-Source Optimal Routing on Large Quantum Circuits
by: Cheng, Chin-Yi, et al.
Published: (2022)

HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference
by: Duan, Cenlin, et al.
Published: (2025)

The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
by: Gilbert, Michael, et al.
Published: (2026)

AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs
by: An, Hangcheng, et al.
Published: (2026)

AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator
by: Xue, Chenhao, et al.
Published: (2026)

Travel Time Based Task Mapping for NoC-Based DNN Accelerator
by: Chen, Yizhi, et al.
Published: (2024)

Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA
by: Ando, Takuto, et al.
Published: (2025)

Real-Time, Energy-Efficient, Sampling-Based Optimal Control via FPGA Acceleration
by: Desai, Tanmay, et al.
Published: (2026)

Designing Spatial Architectures for Sparse Attention: STAR Accelerator via Cross-Stage Tiling
by: Wang, Huizheng, et al.
Published: (2025)

Multilayer Dataflow: Orchestrate Butterfly Sparsity to Accelerate Attention Computation
by: Wu, Haibin, et al.
Published: (2024)

DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation
by: Shao, Kunming, et al.
Published: (2025)

ReadyPower: A Reliable, Interpretable, and Handy Architectural Power Model Based on Analytical Framework
by: Zhang, Qijun, et al.
Published: (2025)

Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
by: Li, Cong, et al.
Published: (2026)

Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators
by: Huang, Zongle, et al.
Published: (2026)

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
by: Zhuang, Jinming, et al.
Published: (2024)