:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ai, Chenyang, Zhao, Lechuan, Huang, Zhijie, Li, Cangyuan, Wang, Xinan, Wang, Ying
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2405.02196
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking Compute Substrates for 3D-Stacked Near-Memory LLM Decoding: Microarchitecture-Scheduling Co-Design
by: Ai, Chenyang, et al.
Published: (2026)

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
by: Cai, Siyang, et al.
Published: (2026)

Transitive Array: An Efficient GEMM Accelerator with Result Reuse
by: Guo, Cong, et al.
Published: (2025)

A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse
by: Hsu, Kai-Chieh, et al.
Published: (2025)

A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator
by: Huang, Sixiao, et al.
Published: (2025)

An Efficient Data Reuse with Tile-Based Adaptive Stationary for Transformer Accelerators
by: Li, Tseng-Jen, et al.
Published: (2025)

HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators
by: Agarwal, Ayushi, et al.
Published: (2026)

StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs
by: Ye, Hanchen, et al.
Published: (2025)

Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators
by: Zhao, Shixin, et al.
Published: (2025)

FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators
by: Jia, Shuao, et al.
Published: (2025)

Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults
by: Alama, Youssef A. Ait, et al.
Published: (2024)

ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning
by: Chen, Zhirong, et al.
Published: (2025)

FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
by: Lu, Jinming, et al.
Published: (2025)

Open-source Stand-Alone Versatile Tensor Accelerator
by: Faure-Gignoux, Anthony, et al.
Published: (2025)

ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics
by: Gao, Ruijie, et al.
Published: (2026)

Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
by: Taka, Endri, et al.
Published: (2025)

A 16 nm 1.60TOPS/W High Utilization DNN Accelerator with 3D Spatial Data Reuse and Efficient Shared Memory Access
by: Yi, Xiaoling, et al.
Published: (2026)

Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation
by: Chang, Kaiyan, et al.
Published: (2024)

GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending
by: Li, Haomin, et al.
Published: (2026)

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
by: Xue, Zi Yu, et al.
Published: (2023)

AME-PIM: Can Memory be Your Next Tensor Accelerator?
by: Venieri, Emanuele, et al.
Published: (2026)

DiSC: Resolution-Scalable Acceleration of Diffusion Models by Exploiting Sparsity and Cached Token Reuse with Hash-based Distribution
by: Yoon, Jieon, et al.
Published: (2026)

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
by: Nayak, Nandeeka, et al.
Published: (2023)

LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control Flow
by: Chang, Kaiyan, et al.
Published: (2025)

Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning
by: Li, Zhaoying, et al.
Published: (2024)

RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator
by: Zhang, Jie, et al.
Published: (2024)

LaMoS: Enabling Efficient Large Number Modular Multiplication through SRAM-based CiM Acceleration
by: Li, Haomin, et al.
Published: (2025)

Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory Tensor Manipulation for High-Throughput AI SoC
by: Zhou, Weiyu, et al.
Published: (2025)

Reuse Detector: Improving the Management of STT-RAM SLLCs
by: RodrÍguez-RodrÍguez, Roberto, et al.
Published: (2024)

ASDR: Exploiting Adaptive Sampling and Data Reuse for CIM-based Instant Neural Rendering
by: Liu, Fangxin, et al.
Published: (2025)

RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
by: Zhu, Yan, et al.
Published: (2026)

ACT: Automatically Generating Compiler Backends from Tensor Accelerator ISA Descriptions
by: Jain, Devansh, et al.
Published: (2025)

DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation
by: Shao, Kunming, et al.
Published: (2025)

Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2024)

NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing
by: Wang, Yitu, et al.
Published: (2023)

HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation
by: Xue, Runzhen, et al.
Published: (2023)

VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator
by: Wang, Zhican, et al.
Published: (2025)

SRAM Based Digital Custom Compute Engine for Improved Area Efficiency of AI Hardware
by: Dhakad, Narendra Singh, et al.
Published: (2026)

Efficient yet Accurate End-to-End SC Accelerator Design
by: Li, Meng, et al.
Published: (2024)

Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)