:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Ruilong, Wang, Yisu, Kutscher, Dirk
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2408.15568
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations
by: Zou, Jiaxiang, et al.
Published: (2026)

A System Development Kit for Big Data Applications on FPGA-based Clusters: The EVEREST Approach
by: Pilato, Christian, et al.
Published: (2024)

When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators
by: Qin, Yifan, et al.
Published: (2026)

An Affordable Experimental Technique for SRAM Write Margin Characterization for Nanometer CMOS Technologies
by: Alorda, Bartomeu, et al.
Published: (2024)

Leveraging Recurrent Patterns in Graph Accelerators
by: Rahimi, Masoud, et al.
Published: (2025)

Towards An Approach to Identify Divergences in Hardware Designs for HPC Workloads
by: Popovici, Doru Thom, et al.
Published: (2025)

Toward Open-Source Chiplets for HPC and AI: Occamy and Beyond
by: Scheffler, Paul, et al.
Published: (2025)

Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing
by: Mallasén, David, et al.
Published: (2023)

Educating for Hardware Specialization in the Chiplet Era: A Path for the HPC Community
by: Yoshii, Kazutomo, et al.
Published: (2024)

Characterization of Real Communication Patterns and Congestion Dynamics in HPC Interconnection Networks
by: de La Rosa, Miguel Sánchez, et al.
Published: (2026)

Reconfigurable Computing Challenge: Real-Time Graph Neural Networks for Online Event Selection in Big Science
by: Neu, Marc, et al.
Published: (2026)

Calibrating DRAMPower Model for HPC: A Runtime Perspective from Real-Time Measurements
by: Shi, Xinyu, et al.
Published: (2024)

MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness
by: Wang, Huizheng, et al.
Published: (2025)

Late Breaking Results: Leveraging Approximate Computing for Carbon-Aware DNN Accelerators
by: Panteleaki, Aikaterini Maria, et al.
Published: (2025)

Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)

Apple vs. Oranges: Evaluating the Apple Silicon M-Series SoCs for HPC Performance and Efficiency
by: Hübner, Paul, et al.
Published: (2025)

FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics
by: Wang, Zeke, et al.
Published: (2025)

Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs
by: Zhu, Zhantong, et al.
Published: (2025)

Efficient and Accurate Graph Classification with Hyperdimensional Computing on FPGA
by: Arockiaraj, Jebacyril, et al.
Published: (2025)

Enabling Efficient Hybrid Systolic Computation in Shared L1-Memory Manycore Clusters
by: Mazzola, Sergio, et al.
Published: (2024)

Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
by: Perotti, Matteo, et al.
Published: (2023)

PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration
by: Chong, Yue Jiet, et al.
Published: (2025)

Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM
by: Liu, Lian, et al.
Published: (2025)

SpikeStream: Accelerating Spiking Neural Network Inference on RISC-V Clusters with Sparse Computation Extensions
by: Manoni, Simone, et al.
Published: (2025)

ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs
by: Durvasula, Sankeerth, et al.
Published: (2024)

FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design
by: Nayak, Nandeeka, et al.
Published: (2024)

ADS-IMC: Accelerating Data Sorting with In-Memory Computation
by: Dhakad, Narendra Singh, et al.
Published: (2026)

The Tiny Median Filter: A Small Size, Flexible Arbitrary Percentile Finder Scheme Suitable for FPGA Implementation
by: Wu, Jinyuan
Published: (2024)

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation
by: Ottaviano, Alessandro, et al.
Published: (2023)

Automated Physical Design Watermarking Leveraging Graph Neural Networks
by: Zhang, Ruisi, et al.
Published: (2024)

EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology
by: Wu, Qizhe, et al.
Published: (2024)

NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing
by: Wang, Yitu, et al.
Published: (2023)

Shared-PIM: Enabling Concurrent Computation and Data Flow for Faster Processing-in-DRAM
by: Mamdouh, Ahmed, et al.
Published: (2024)

Multilayer Dataflow: Orchestrate Butterfly Sparsity to Accelerate Attention Computation
by: Wu, Haibin, et al.
Published: (2024)

SiHGNN: Leveraging Properties of Semantic Graphs for Efficient HGNN Acceleration
by: Xue, Runzhen, et al.
Published: (2024)

CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA
by: Dong, Jiale, et al.
Published: (2025)

RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping
by: Baik, Jangho, et al.
Published: (2026)

A Computing-in-Memory-based One-Class Hyperdimensional Computing Model for Outlier Detection
by: Wang, Ruixuan, et al.
Published: (2023)

The Data Conversion Bottleneck in Analog Computing Accelerators
by: Meech, James T., et al.
Published: (2023)

Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation
by: Wang, Yanjing, et al.
Published: (2025)