:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Hansung, Yan, Ruohan Richard, You, Joshua, Yang, Tieliang Vamber, Shao, Yakun Sophia
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2408.12073
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats
by: Noh, Seock-Hwan, et al.
Published: (2025)

Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
by: Perotti, Matteo, et al.
Published: (2023)

DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
by: Hong, Charles, et al.
Published: (2025)

Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
by: Hong, Jeongmin, et al.
Published: (2024)

tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit
by: Vellaisamy, Prabhu, et al.
Published: (2024)

A Systematic Characterization of LLM Inference on GPUs
by: Wang, Haonan, et al.
Published: (2025)

Towards Zero-Stall Matrix Multiplication on Energy-Efficient RISC-V Clusters for Machine Learning Acceleration
by: Colagrande, Luca, et al.
Published: (2025)

Control Flow Management in Modern GPUs
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)

Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators
by: Hong, Charles, et al.
Published: (2025)

Privacy-Preserving Performance Profiling of In-The-Wild GPUs
by: McDougall, Ian, et al.
Published: (2025)

An Energy-Efficient Approximate Posit Multiply-Divide Unit
by: Thotli, Rishi, et al.
Published: (2026)

Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
by: Yoon, Hyunsung, et al.
Published: (2026)

How to Increase Energy Efficiency with a Single Linux Command
by: Jelvani, Alborz, et al.
Published: (2025)

A Scalable Resource Management Layer for FPGA SoCs in 6G Radio Units
by: Bartzoudis, Nikolaos, et al.
Published: (2025)

Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs
by: Chowdhary, Sangeeta, et al.
Published: (2026)

GPIR: Enabling Practical Private Information Retrieval with GPUs
by: Ji, Hyesung, et al.
Published: (2026)

D-Legion: A Scalable Many-Core Architecture for Accelerating Matrix Multiplication in Quantized LLMs
by: Abdelmaksoud, Ahmed J., et al.
Published: (2026)

Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication
by: Riedel, Samuel, et al.
Published: (2025)

Optimizing Energy Efficiency in Subthreshold RISC-V Cores
by: Djupdal, Asbjørn, et al.
Published: (2025)

CarbonSet: A Dataset to Analyze Trends and Benchmark the Sustainability of CPUs and GPUs
by: Hu, Jiajun, et al.
Published: (2025)

LLM-Aided Compilation for Tensor Accelerators
by: Hong, Charles, et al.
Published: (2024)

hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
by: Hong, Charles, et al.
Published: (2025)

Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency
by: Li, Mengming, et al.
Published: (2025)

Study on the Particle Sorting Performance for Reactor Monte Carlo Neutron Transport on Apple Unified Memory GPUs
by: Liu, Changyuan
Published: (2024)

SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency
by: Kyung, Kwanhee, et al.
Published: (2025)

16 Years of SPEC Power: An Analysis of x86 Energy Efficiency Trends
by: Tröpgen, Hannes, et al.
Published: (2024)

Increasing the Energy-Efficiency of Wearables Using Low-Precision Posit Arithmetic with PHEE
by: Mallasén, David, et al.
Published: (2025)

PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
by: Lee, Dongjae, et al.
Published: (2024)

Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA
by: Zhu, Xuqi, et al.
Published: (2024)

Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators
by: Vellaisamy, Prabhu, et al.
Published: (2026)

MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication
by: Perotti, Matteo, et al.
Published: (2024)

LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference
by: Moon, Seungjae, et al.
Published: (2024)

Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration
by: Feng, Yinxiao, et al.
Published: (2024)

DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead Execution
by: Yang, Xin, et al.
Published: (2025)

Evaluation of Run-Time Energy Efficiency using Controlled Approximation in a RISC-V Core
by: Delavari, Arvin, et al.
Published: (2024)

ISAAC: Intelligent, Scalable, Agile, and Accelerated CPU Verification via LLM-aided FPGA Parallelism
by: Sun, Jialin, et al.
Published: (2025)

DiP: A Scalable, Energy-Efficient Systolic Array for Matrix Multiplication Acceleration
by: Abdelmaksoud, Ahmed J., et al.
Published: (2024)

Instruction Scheduling in the Saturn Vector Unit
by: Zhao, Jerry, et al.
Published: (2024)

Hidden Risks of Unmonitored GPUs in Intelligent Transportation Systems
by: Puspa, Sefatun-Noor, et al.
Published: (2026)

Modeling PFAS in Semiconductor Manufacturing to Quantify Trade-offs in Energy Efficiency and Environmental Impact of Computing Systems
by: Elgamal, Mariam, et al.
Published: (2025)