Saved in:
| Main Authors: | Ai, Chenyang, Zhao, Lechuan, Huang, Zhijie, Li, Cangyuan, Wang, Xinan, Wang, Ying |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.02196 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rethinking Compute Substrates for 3D-Stacked Near-Memory LLM Decoding: Microarchitecture-Scheduling Co-Design
by: Ai, Chenyang, et al.
Published: (2026)
by: Ai, Chenyang, et al.
Published: (2026)
Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
by: Cai, Siyang, et al.
Published: (2026)
by: Cai, Siyang, et al.
Published: (2026)
Transitive Array: An Efficient GEMM Accelerator with Result Reuse
by: Guo, Cong, et al.
Published: (2025)
by: Guo, Cong, et al.
Published: (2025)
A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse
by: Hsu, Kai-Chieh, et al.
Published: (2025)
by: Hsu, Kai-Chieh, et al.
Published: (2025)
A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator
by: Huang, Sixiao, et al.
Published: (2025)
by: Huang, Sixiao, et al.
Published: (2025)
An Efficient Data Reuse with Tile-Based Adaptive Stationary for Transformer Accelerators
by: Li, Tseng-Jen, et al.
Published: (2025)
by: Li, Tseng-Jen, et al.
Published: (2025)
HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators
by: Agarwal, Ayushi, et al.
Published: (2026)
by: Agarwal, Ayushi, et al.
Published: (2026)
StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs
by: Ye, Hanchen, et al.
Published: (2025)
by: Ye, Hanchen, et al.
Published: (2025)
Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators
by: Zhao, Shixin, et al.
Published: (2025)
by: Zhao, Shixin, et al.
Published: (2025)
FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators
by: Jia, Shuao, et al.
Published: (2025)
by: Jia, Shuao, et al.
Published: (2025)
Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults
by: Alama, Youssef A. Ait, et al.
Published: (2024)
by: Alama, Youssef A. Ait, et al.
Published: (2024)
ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning
by: Chen, Zhirong, et al.
Published: (2025)
by: Chen, Zhirong, et al.
Published: (2025)
FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
by: Lu, Jinming, et al.
Published: (2025)
by: Lu, Jinming, et al.
Published: (2025)
Open-source Stand-Alone Versatile Tensor Accelerator
by: Faure-Gignoux, Anthony, et al.
Published: (2025)
by: Faure-Gignoux, Anthony, et al.
Published: (2025)
ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics
by: Gao, Ruijie, et al.
Published: (2026)
by: Gao, Ruijie, et al.
Published: (2026)
Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
by: Taka, Endri, et al.
Published: (2025)
by: Taka, Endri, et al.
Published: (2025)
A 16 nm 1.60TOPS/W High Utilization DNN Accelerator with 3D Spatial Data Reuse and Efficient Shared Memory Access
by: Yi, Xiaoling, et al.
Published: (2026)
by: Yi, Xiaoling, et al.
Published: (2026)
Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation
by: Chang, Kaiyan, et al.
Published: (2024)
by: Chang, Kaiyan, et al.
Published: (2024)
GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending
by: Li, Haomin, et al.
Published: (2026)
by: Li, Haomin, et al.
Published: (2026)
Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
by: Xue, Zi Yu, et al.
Published: (2023)
by: Xue, Zi Yu, et al.
Published: (2023)
AME-PIM: Can Memory be Your Next Tensor Accelerator?
by: Venieri, Emanuele, et al.
Published: (2026)
by: Venieri, Emanuele, et al.
Published: (2026)
DiSC: Resolution-Scalable Acceleration of Diffusion Models by Exploiting Sparsity and Cached Token Reuse with Hash-based Distribution
by: Yoon, Jieon, et al.
Published: (2026)
by: Yoon, Jieon, et al.
Published: (2026)
TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
by: Nayak, Nandeeka, et al.
Published: (2023)
by: Nayak, Nandeeka, et al.
Published: (2023)
LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control Flow
by: Chang, Kaiyan, et al.
Published: (2025)
by: Chang, Kaiyan, et al.
Published: (2025)
Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning
by: Li, Zhaoying, et al.
Published: (2024)
by: Li, Zhaoying, et al.
Published: (2024)
RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator
by: Zhang, Jie, et al.
Published: (2024)
by: Zhang, Jie, et al.
Published: (2024)
LaMoS: Enabling Efficient Large Number Modular Multiplication through SRAM-based CiM Acceleration
by: Li, Haomin, et al.
Published: (2025)
by: Li, Haomin, et al.
Published: (2025)
Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory Tensor Manipulation for High-Throughput AI SoC
by: Zhou, Weiyu, et al.
Published: (2025)
by: Zhou, Weiyu, et al.
Published: (2025)
Reuse Detector: Improving the Management of STT-RAM SLLCs
by: RodrÍguez-RodrÍguez, Roberto, et al.
Published: (2024)
by: RodrÍguez-RodrÍguez, Roberto, et al.
Published: (2024)
ASDR: Exploiting Adaptive Sampling and Data Reuse for CIM-based Instant Neural Rendering
by: Liu, Fangxin, et al.
Published: (2025)
by: Liu, Fangxin, et al.
Published: (2025)
RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)
by: Zhu, Yan, et al.
Published: (2026)
by: Zhu, Yan, et al.
Published: (2026)
ACT: Automatically Generating Compiler Backends from Tensor Accelerator ISA Descriptions
by: Jain, Devansh, et al.
Published: (2025)
by: Jain, Devansh, et al.
Published: (2025)
DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation
by: Shao, Kunming, et al.
Published: (2025)
by: Shao, Kunming, et al.
Published: (2025)
Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2024)
by: Andrulis, Tanner, et al.
Published: (2024)
NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing
by: Wang, Yitu, et al.
Published: (2023)
by: Wang, Yitu, et al.
Published: (2023)
HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation
by: Xue, Runzhen, et al.
Published: (2023)
by: Xue, Runzhen, et al.
Published: (2023)
VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator
by: Wang, Zhican, et al.
Published: (2025)
by: Wang, Zhican, et al.
Published: (2025)
SRAM Based Digital Custom Compute Engine for Improved Area Efficiency of AI Hardware
by: Dhakad, Narendra Singh, et al.
Published: (2026)
by: Dhakad, Narendra Singh, et al.
Published: (2026)
Efficient yet Accurate End-to-End SC Accelerator Design
by: Li, Meng, et al.
Published: (2024)
by: Li, Meng, et al.
Published: (2024)
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)
by: Li, Huize, et al.
Published: (2026)
Similar Items
-
Rethinking Compute Substrates for 3D-Stacked Near-Memory LLM Decoding: Microarchitecture-Scheduling Co-Design
by: Ai, Chenyang, et al.
Published: (2026) -
Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL
by: Cai, Siyang, et al.
Published: (2026) -
Transitive Array: An Efficient GEMM Accelerator with Result Reuse
by: Guo, Cong, et al.
Published: (2025) -
A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse
by: Hsu, Kai-Chieh, et al.
Published: (2025) -
A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator
by: Huang, Sixiao, et al.
Published: (2025)