:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Dudek, Piotr
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture C.1.4
Online Access:	https://arxiv.org/abs/2402.12130
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores
by: Rout, Nikhil, et al.
Published: (2025)

FlexiBit: Fully Flexible Precision Bit-parallel Accelerator Architecture for Arbitrary Mixed Precision AI
by: Tahmasebi, Faraz, et al.
Published: (2024)

Lincoln AI Computing Survey (LAICS) and Trends
by: Reuther, Albert, et al.
Published: (2025)

Dataflow & Tiling Strategies in Edge-AI FPGA Accelerators: A Comprehensive Literature Review
by: Li, Richie
Published: (2025)

TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
by: Li, Zhuoran, et al.
Published: (2026)

Toward a Universal GPU Instruction Set Architecture: A Cross-Vendor Analysis of Hardware-Invariant Computational Primitives in Parallel Processors
by: Abraham, Ojima, et al.
Published: (2026)

D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of Activations
by: Tahmasebi, Faraz, et al.
Published: (2025)

A Per-Access Upper Bound for Shared-Resource Interference in Direct-Mapped Multicore Architectures
by: Pedroni, Felipe T.
Published: (2026)

Design and Implementation of an FPGA-Based Hardware Accelerator for Transformer
by: Li, Richie, et al.
Published: (2025)

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
by: Sanovar, Rya, et al.
Published: (2024)

A Comparative Analysis of ARM and x86-64 Laptop-Class Processors: Architecture, Assembly-Level Performance, and Energy Efficiency
by: Özyılmaz, Mustafa Mert
Published: (2026)

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning
by: Yuan, Aojie, et al.
Published: (2026)

Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling
by: Li, Yinrong, et al.
Published: (2026)

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory
by: Jo, Myeong Jun
Published: (2026)

GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation
by: Palaniappan, Kathiravan
Published: (2026)

ArchAgent: Agentic AI-driven Computer Architecture Discovery
by: Gupta, Raghav, et al.
Published: (2026)

CLIPGen: A Chiplet Link IP Modeling and Generation Framework for 2.5D Architecture Exploration
by: Zhu, Zhengping, et al.
Published: (2026)

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
by: Pati, Suchita, et al.
Published: (2024)

Wattchmen: Watching the Wattchers -- High Fidelity, Flexible GPU Energy Modeling
by: Tran, Brandon, et al.
Published: (2026)

FREESS: A Web-Based Educational Simulator for a RISC-V-Inspired Superscalar Processor with Tomasulo-Style Dynamic Scheduling
by: Giorgi, Roberto, et al.
Published: (2026)

Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation
by: Cheng, Long, et al.
Published: (2026)

Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2026)

The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
by: Gilbert, Michael, et al.
Published: (2026)

RV-IM100: Quantifying ISA Extension, Datapath Width, and Pipeline Depth Trade-offs in RISC-V Microarchitectures
by: Kang, Hyunwoo
Published: (2026)

AMC: Access to Miss Correlation Prefetcher for Evolving Graph Analytics
by: Singh, Abhishek, et al.
Published: (2024)

TriADA: Massively Parallel Trilinear Matrix-by-Tensor Multiply-Add Algorithm and Device Architecture for the Acceleration of 3D Discrete Transformations
by: Sedukhin, Stanislav, et al.
Published: (2025)

Coordinated Reinforcement Learning Prefetching Architecture for Multicore Systems
by: Siddiqui, Mohammed Humaid, et al.
Published: (2025)

DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance
by: Palaniappan, Kathiravan
Published: (2026)

CEO-DC: Driving Decarbonization in HPC Data Centers with Actionable Insights
by: Álvarez, Rubén Rodríguez, et al.
Published: (2025)

Improving Injection-Throttling Mechanisms for Congestion Control for Data-center and Supercomputer Interconnects
by: Olmedilla, Cristina, et al.
Published: (2025)

Ember: A Compiler for Efficient Embedding Operations on Decoupled Access-Execute Architectures
by: Siracusa, Marco, et al.
Published: (2025)

Glass-Box Analysis for Computer Systems: Transparency Index, Shapley Attribution, and Markov Models of Branch Prediction
by: Alpay, Faruk, et al.
Published: (2025)

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
by: Mo, Zhiwen, et al.
Published: (2024)

Exploring the Design Space for Message-Driven Systems for Dynamic Graph Processing using CCA
by: Chandio, Bibrak Qamar, et al.
Published: (2024)

Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques
by: Bera, Rahul
Published: (2026)

A Compilation Framework for Quantum Circuits with Mid-Circuit Measurement Error Awareness
by: Zhong, Ming, et al.
Published: (2025)

A Flexible Instruction Set Architecture for Efficient GEMMs
by: Santana, Alexandre de Limas, et al.
Published: (2025)

Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision
by: Dai, Xilai, et al.
Published: (2024)

VolTune: A Fine-Grained Runtime Voltage Control Architecture for FPGA Systems
by: Ahmed, Akram Ben, et al.
Published: (2026)

An Integrated UVM-TLM Co-Simulation Framework for RISC-V Functional Verification and Performance Evaluation
by: Qiu, Ruizhi, et al.
Published: (2025)