Saved in:
| Main Author: | Dudek, Piotr |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.12130 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores
by: Rout, Nikhil, et al.
Published: (2025)
by: Rout, Nikhil, et al.
Published: (2025)
FlexiBit: Fully Flexible Precision Bit-parallel Accelerator Architecture for Arbitrary Mixed Precision AI
by: Tahmasebi, Faraz, et al.
Published: (2024)
by: Tahmasebi, Faraz, et al.
Published: (2024)
Lincoln AI Computing Survey (LAICS) and Trends
by: Reuther, Albert, et al.
Published: (2025)
by: Reuther, Albert, et al.
Published: (2025)
Dataflow & Tiling Strategies in Edge-AI FPGA Accelerators: A Comprehensive Literature Review
by: Li, Richie
Published: (2025)
by: Li, Richie
Published: (2025)
TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
by: Li, Zhuoran, et al.
Published: (2026)
by: Li, Zhuoran, et al.
Published: (2026)
Toward a Universal GPU Instruction Set Architecture: A Cross-Vendor Analysis of Hardware-Invariant Computational Primitives in Parallel Processors
by: Abraham, Ojima, et al.
Published: (2026)
by: Abraham, Ojima, et al.
Published: (2026)
D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of Activations
by: Tahmasebi, Faraz, et al.
Published: (2025)
by: Tahmasebi, Faraz, et al.
Published: (2025)
A Per-Access Upper Bound for Shared-Resource Interference in Direct-Mapped Multicore Architectures
by: Pedroni, Felipe T.
Published: (2026)
by: Pedroni, Felipe T.
Published: (2026)
Design and Implementation of an FPGA-Based Hardware Accelerator for Transformer
by: Li, Richie, et al.
Published: (2025)
by: Li, Richie, et al.
Published: (2025)
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
by: Sanovar, Rya, et al.
Published: (2024)
by: Sanovar, Rya, et al.
Published: (2024)
A Comparative Analysis of ARM and x86-64 Laptop-Class Processors: Architecture, Assembly-Level Performance, and Energy Efficiency
by: Özyılmaz, Mustafa Mert
Published: (2026)
by: Özyılmaz, Mustafa Mert
Published: (2026)
Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning
by: Yuan, Aojie, et al.
Published: (2026)
by: Yuan, Aojie, et al.
Published: (2026)
Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling
by: Li, Yinrong, et al.
Published: (2026)
by: Li, Yinrong, et al.
Published: (2026)
Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory
by: Jo, Myeong Jun
Published: (2026)
by: Jo, Myeong Jun
Published: (2026)
GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation
by: Palaniappan, Kathiravan
Published: (2026)
by: Palaniappan, Kathiravan
Published: (2026)
ArchAgent: Agentic AI-driven Computer Architecture Discovery
by: Gupta, Raghav, et al.
Published: (2026)
by: Gupta, Raghav, et al.
Published: (2026)
CLIPGen: A Chiplet Link IP Modeling and Generation Framework for 2.5D Architecture Exploration
by: Zhu, Zhengping, et al.
Published: (2026)
by: Zhu, Zhengping, et al.
Published: (2026)
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
by: Pati, Suchita, et al.
Published: (2024)
by: Pati, Suchita, et al.
Published: (2024)
Wattchmen: Watching the Wattchers -- High Fidelity, Flexible GPU Energy Modeling
by: Tran, Brandon, et al.
Published: (2026)
by: Tran, Brandon, et al.
Published: (2026)
FREESS: A Web-Based Educational Simulator for a RISC-V-Inspired Superscalar Processor with Tomasulo-Style Dynamic Scheduling
by: Giorgi, Roberto, et al.
Published: (2026)
by: Giorgi, Roberto, et al.
Published: (2026)
Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell via Temporal Correlation
by: Cheng, Long, et al.
Published: (2026)
by: Cheng, Long, et al.
Published: (2026)
Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
by: Andrulis, Tanner, et al.
Published: (2026)
by: Andrulis, Tanner, et al.
Published: (2026)
The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
by: Gilbert, Michael, et al.
Published: (2026)
by: Gilbert, Michael, et al.
Published: (2026)
RV-IM100: Quantifying ISA Extension, Datapath Width, and Pipeline Depth Trade-offs in RISC-V Microarchitectures
by: Kang, Hyunwoo
Published: (2026)
by: Kang, Hyunwoo
Published: (2026)
AMC: Access to Miss Correlation Prefetcher for Evolving Graph Analytics
by: Singh, Abhishek, et al.
Published: (2024)
by: Singh, Abhishek, et al.
Published: (2024)
TriADA: Massively Parallel Trilinear Matrix-by-Tensor Multiply-Add Algorithm and Device Architecture for the Acceleration of 3D Discrete Transformations
by: Sedukhin, Stanislav, et al.
Published: (2025)
by: Sedukhin, Stanislav, et al.
Published: (2025)
Coordinated Reinforcement Learning Prefetching Architecture for Multicore Systems
by: Siddiqui, Mohammed Humaid, et al.
Published: (2025)
by: Siddiqui, Mohammed Humaid, et al.
Published: (2025)
DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance
by: Palaniappan, Kathiravan
Published: (2026)
by: Palaniappan, Kathiravan
Published: (2026)
CEO-DC: Driving Decarbonization in HPC Data Centers with Actionable Insights
by: Álvarez, Rubén Rodríguez, et al.
Published: (2025)
by: Álvarez, Rubén Rodríguez, et al.
Published: (2025)
Improving Injection-Throttling Mechanisms for Congestion Control for Data-center and Supercomputer Interconnects
by: Olmedilla, Cristina, et al.
Published: (2025)
by: Olmedilla, Cristina, et al.
Published: (2025)
Ember: A Compiler for Efficient Embedding Operations on Decoupled Access-Execute Architectures
by: Siracusa, Marco, et al.
Published: (2025)
by: Siracusa, Marco, et al.
Published: (2025)
Glass-Box Analysis for Computer Systems: Transparency Index, Shapley Attribution, and Markov Models of Branch Prediction
by: Alpay, Faruk, et al.
Published: (2025)
by: Alpay, Faruk, et al.
Published: (2025)
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
by: Mo, Zhiwen, et al.
Published: (2024)
by: Mo, Zhiwen, et al.
Published: (2024)
Exploring the Design Space for Message-Driven Systems for Dynamic Graph Processing using CCA
by: Chandio, Bibrak Qamar, et al.
Published: (2024)
by: Chandio, Bibrak Qamar, et al.
Published: (2024)
Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques
by: Bera, Rahul
Published: (2026)
by: Bera, Rahul
Published: (2026)
A Compilation Framework for Quantum Circuits with Mid-Circuit Measurement Error Awareness
by: Zhong, Ming, et al.
Published: (2025)
by: Zhong, Ming, et al.
Published: (2025)
A Flexible Instruction Set Architecture for Efficient GEMMs
by: Santana, Alexandre de Limas, et al.
Published: (2025)
by: Santana, Alexandre de Limas, et al.
Published: (2025)
Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision
by: Dai, Xilai, et al.
Published: (2024)
by: Dai, Xilai, et al.
Published: (2024)
VolTune: A Fine-Grained Runtime Voltage Control Architecture for FPGA Systems
by: Ahmed, Akram Ben, et al.
Published: (2026)
by: Ahmed, Akram Ben, et al.
Published: (2026)
An Integrated UVM-TLM Co-Simulation Framework for RISC-V Functional Verification and Performance Evaluation
by: Qiu, Ruizhi, et al.
Published: (2025)
by: Qiu, Ruizhi, et al.
Published: (2025)
Similar Items
-
Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores
by: Rout, Nikhil, et al.
Published: (2025) -
FlexiBit: Fully Flexible Precision Bit-parallel Accelerator Architecture for Arbitrary Mixed Precision AI
by: Tahmasebi, Faraz, et al.
Published: (2024) -
Lincoln AI Computing Survey (LAICS) and Trends
by: Reuther, Albert, et al.
Published: (2025) -
Dataflow & Tiling Strategies in Edge-AI FPGA Accelerators: A Comprehensive Literature Review
by: Li, Richie
Published: (2025) -
TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
by: Li, Zhuoran, et al.
Published: (2026)