Saved in:
| Main Authors: | Kim, Hansung, Yan, Ruohan Richard, You, Joshua, Yang, Tieliang Vamber, Shao, Yakun Sophia |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.12073 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats
by: Noh, Seock-Hwan, et al.
Published: (2025)
by: Noh, Seock-Hwan, et al.
Published: (2025)
Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
by: Perotti, Matteo, et al.
Published: (2023)
by: Perotti, Matteo, et al.
Published: (2023)
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
by: Hong, Charles, et al.
Published: (2025)
by: Hong, Charles, et al.
Published: (2025)
Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
by: Hong, Jeongmin, et al.
Published: (2024)
by: Hong, Jeongmin, et al.
Published: (2024)
tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit
by: Vellaisamy, Prabhu, et al.
Published: (2024)
by: Vellaisamy, Prabhu, et al.
Published: (2024)
A Systematic Characterization of LLM Inference on GPUs
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
Towards Zero-Stall Matrix Multiplication on Energy-Efficient RISC-V Clusters for Machine Learning Acceleration
by: Colagrande, Luca, et al.
Published: (2025)
by: Colagrande, Luca, et al.
Published: (2025)
Control Flow Management in Modern GPUs
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)
by: Shoushtary, Mojtaba Abaie, et al.
Published: (2024)
Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators
by: Hong, Charles, et al.
Published: (2025)
by: Hong, Charles, et al.
Published: (2025)
Privacy-Preserving Performance Profiling of In-The-Wild GPUs
by: McDougall, Ian, et al.
Published: (2025)
by: McDougall, Ian, et al.
Published: (2025)
An Energy-Efficient Approximate Posit Multiply-Divide Unit
by: Thotli, Rishi, et al.
Published: (2026)
by: Thotli, Rishi, et al.
Published: (2026)
Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
by: Yoon, Hyunsung, et al.
Published: (2026)
by: Yoon, Hyunsung, et al.
Published: (2026)
How to Increase Energy Efficiency with a Single Linux Command
by: Jelvani, Alborz, et al.
Published: (2025)
by: Jelvani, Alborz, et al.
Published: (2025)
A Scalable Resource Management Layer for FPGA SoCs in 6G Radio Units
by: Bartzoudis, Nikolaos, et al.
Published: (2025)
by: Bartzoudis, Nikolaos, et al.
Published: (2025)
Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs
by: Chowdhary, Sangeeta, et al.
Published: (2026)
by: Chowdhary, Sangeeta, et al.
Published: (2026)
GPIR: Enabling Practical Private Information Retrieval with GPUs
by: Ji, Hyesung, et al.
Published: (2026)
by: Ji, Hyesung, et al.
Published: (2026)
D-Legion: A Scalable Many-Core Architecture for Accelerating Matrix Multiplication in Quantized LLMs
by: Abdelmaksoud, Ahmed J., et al.
Published: (2026)
by: Abdelmaksoud, Ahmed J., et al.
Published: (2026)
Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication
by: Riedel, Samuel, et al.
Published: (2025)
by: Riedel, Samuel, et al.
Published: (2025)
Optimizing Energy Efficiency in Subthreshold RISC-V Cores
by: Djupdal, Asbjørn, et al.
Published: (2025)
by: Djupdal, Asbjørn, et al.
Published: (2025)
CarbonSet: A Dataset to Analyze Trends and Benchmark the Sustainability of CPUs and GPUs
by: Hu, Jiajun, et al.
Published: (2025)
by: Hu, Jiajun, et al.
Published: (2025)
LLM-Aided Compilation for Tensor Accelerators
by: Hong, Charles, et al.
Published: (2024)
by: Hong, Charles, et al.
Published: (2024)
hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation
by: Hong, Charles, et al.
Published: (2025)
by: Hong, Charles, et al.
Published: (2025)
Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency
by: Li, Mengming, et al.
Published: (2025)
by: Li, Mengming, et al.
Published: (2025)
Study on the Particle Sorting Performance for Reactor Monte Carlo Neutron Transport on Apple Unified Memory GPUs
by: Liu, Changyuan
Published: (2024)
by: Liu, Changyuan
Published: (2024)
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency
by: Kyung, Kwanhee, et al.
Published: (2025)
by: Kyung, Kwanhee, et al.
Published: (2025)
16 Years of SPEC Power: An Analysis of x86 Energy Efficiency Trends
by: Tröpgen, Hannes, et al.
Published: (2024)
by: Tröpgen, Hannes, et al.
Published: (2024)
Increasing the Energy-Efficiency of Wearables Using Low-Precision Posit Arithmetic with PHEE
by: Mallasén, David, et al.
Published: (2025)
by: Mallasén, David, et al.
Published: (2025)
PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
by: Lee, Dongjae, et al.
Published: (2024)
by: Lee, Dongjae, et al.
Published: (2024)
Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA
by: Zhu, Xuqi, et al.
Published: (2024)
by: Zhu, Xuqi, et al.
Published: (2024)
Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators
by: Vellaisamy, Prabhu, et al.
Published: (2026)
by: Vellaisamy, Prabhu, et al.
Published: (2026)
MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication
by: Perotti, Matteo, et al.
Published: (2024)
by: Perotti, Matteo, et al.
Published: (2024)
LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference
by: Moon, Seungjae, et al.
Published: (2024)
by: Moon, Seungjae, et al.
Published: (2024)
Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration
by: Feng, Yinxiao, et al.
Published: (2024)
by: Feng, Yinxiao, et al.
Published: (2024)
DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead Execution
by: Yang, Xin, et al.
Published: (2025)
by: Yang, Xin, et al.
Published: (2025)
Evaluation of Run-Time Energy Efficiency using Controlled Approximation in a RISC-V Core
by: Delavari, Arvin, et al.
Published: (2024)
by: Delavari, Arvin, et al.
Published: (2024)
ISAAC: Intelligent, Scalable, Agile, and Accelerated CPU Verification via LLM-aided FPGA Parallelism
by: Sun, Jialin, et al.
Published: (2025)
by: Sun, Jialin, et al.
Published: (2025)
DiP: A Scalable, Energy-Efficient Systolic Array for Matrix Multiplication Acceleration
by: Abdelmaksoud, Ahmed J., et al.
Published: (2024)
by: Abdelmaksoud, Ahmed J., et al.
Published: (2024)
Instruction Scheduling in the Saturn Vector Unit
by: Zhao, Jerry, et al.
Published: (2024)
by: Zhao, Jerry, et al.
Published: (2024)
Hidden Risks of Unmonitored GPUs in Intelligent Transportation Systems
by: Puspa, Sefatun-Noor, et al.
Published: (2026)
by: Puspa, Sefatun-Noor, et al.
Published: (2026)
Modeling PFAS in Semiconductor Manufacturing to Quantify Trade-offs in Energy Efficiency and Environmental Impact of Computing Systems
by: Elgamal, Mariam, et al.
Published: (2025)
by: Elgamal, Mariam, et al.
Published: (2025)
Similar Items
-
Jack Unit: An Area- and Energy-Efficient Multiply-Accumulate (MAC) Unit Supporting Diverse Data Formats
by: Noh, Seock-Hwan, et al.
Published: (2025) -
Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
by: Perotti, Matteo, et al.
Published: (2023) -
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
by: Hong, Charles, et al.
Published: (2025) -
Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory
by: Hong, Jeongmin, et al.
Published: (2024) -
tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit
by: Vellaisamy, Prabhu, et al.
Published: (2024)