Saved in:
| Main Authors: | Klein, Bernhard, Selker, Falk, Borras, Hendrik, Steger, Sophie, Pernkopf, Franz, Fröning, Holger |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.23440 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
by: Kubo, Tatsuya, et al.
Published: (2025)
by: Kubo, Tatsuya, et al.
Published: (2025)
Kitsune: Enabling Dataflow Execution on GPUs
by: Davies, Michael, et al.
Published: (2025)
by: Davies, Michael, et al.
Published: (2025)
Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A
by: Jarmusch, Aaron, et al.
Published: (2026)
by: Jarmusch, Aaron, et al.
Published: (2026)
Achieving Dependability of AI Execution with Radiation Hardened Processors
by: Taquichiri, Carlos Rafael Tordoya, et al.
Published: (2025)
by: Taquichiri, Carlos Rafael Tordoya, et al.
Published: (2025)
Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous Systems
by: Bai, Zhenyu, et al.
Published: (2025)
by: Bai, Zhenyu, et al.
Published: (2025)
Lit Silicon: A Case Where Thermal Imbalance Couples Concurrent Execution in Multiple GPUs
by: Kurzynski, Marco, et al.
Published: (2025)
by: Kurzynski, Marco, et al.
Published: (2025)
Evaluating Rapid Makespan Predictions for Heterogeneous Systems with Programmable Logic
by: Wilhelm, Martin, et al.
Published: (2025)
by: Wilhelm, Martin, et al.
Published: (2025)
Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2024)
by: Wijeratne, Sasindu, et al.
Published: (2024)
Leveraging SIMD for Accelerating Large-number Arithmetic
by: Das, Subhrajit, et al.
Published: (2026)
by: Das, Subhrajit, et al.
Published: (2026)
Next-generation Probabilistic Computing Hardware with 3D MOSAICs, Illusion Scale-up, and Co-design
by: Srimani, Tathagata, et al.
Published: (2024)
by: Srimani, Tathagata, et al.
Published: (2024)
Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)
by: Asquini, Lorenzo, et al.
Published: (2025)
Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
Accelerating Data Chunking in Deduplication Systems using Vector Instructions
by: Udayashankar, Sreeharsha, et al.
Published: (2025)
by: Udayashankar, Sreeharsha, et al.
Published: (2025)
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
by: Zhang, Qijun, et al.
Published: (2026)
by: Zhang, Qijun, et al.
Published: (2026)
A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
by: Sharma, Harsh, et al.
Published: (2023)
by: Sharma, Harsh, et al.
Published: (2023)
Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
by: Elwasif, Wael, et al.
Published: (2022)
by: Elwasif, Wael, et al.
Published: (2022)
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
by: Adnan, Muhammad, et al.
Published: (2024)
by: Adnan, Muhammad, et al.
Published: (2024)
TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs
by: Prakriya, Neha, et al.
Published: (2023)
by: Prakriya, Neha, et al.
Published: (2023)
Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators
by: Shen, Aofeng, et al.
Published: (2025)
by: Shen, Aofeng, et al.
Published: (2025)
FLEX: Leveraging FPGA-CPU Synergy for Mixed-Cell-Height Legalization Acceleration
by: Liu, Xingyu, et al.
Published: (2025)
by: Liu, Xingyu, et al.
Published: (2025)
Compiler Support for Speculation in Decoupled Access/Execute Architectures
by: Szafarczyk, Robert, et al.
Published: (2025)
by: Szafarczyk, Robert, et al.
Published: (2025)
An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator
by: Qiu, Tong Dong, et al.
Published: (2023)
by: Qiu, Tong Dong, et al.
Published: (2023)
CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories
by: Shi, Man, et al.
Published: (2024)
by: Shi, Man, et al.
Published: (2024)
DiP: A Scalable, Energy-Efficient Systolic Array for Matrix Multiplication Acceleration
by: Abdelmaksoud, Ahmed J., et al.
Published: (2024)
by: Abdelmaksoud, Ahmed J., et al.
Published: (2024)
EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data Transfer
by: Chen, Yi, et al.
Published: (2025)
by: Chen, Yi, et al.
Published: (2025)
DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics
by: Cao, Yingqi, et al.
Published: (2024)
by: Cao, Yingqi, et al.
Published: (2024)
CCSS: Hardware-Accelerated RTL Simulation with Fast Combinational Logic Computing and Sequential Logic Synchronization
by: Feng, Weigang, et al.
Published: (2025)
by: Feng, Weigang, et al.
Published: (2025)
A Survey of Real-time Scheduling on Accelerator-based Heterogeneous Architecture for Time Critical Applications
by: Zou, An, et al.
Published: (2025)
by: Zou, An, et al.
Published: (2025)
A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML Accelerators
by: Colagrande, Luca, et al.
Published: (2026)
by: Colagrande, Luca, et al.
Published: (2026)
SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding
by: Xu, Weihong, et al.
Published: (2025)
by: Xu, Weihong, et al.
Published: (2025)
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators
by: Mo, Zhiwen, et al.
Published: (2026)
by: Mo, Zhiwen, et al.
Published: (2026)
MANOJAVAM: A Scalable, Unified FPGA Accelerator for Matrix Multiplication and Singular Value Decomposition in Principal Component Analysis
by: Ramasubramanian, Srivaths, et al.
Published: (2026)
by: Ramasubramanian, Srivaths, et al.
Published: (2026)
XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCs
by: Kong, Fanchen, et al.
Published: (2025)
by: Kong, Fanchen, et al.
Published: (2025)
DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling
by: Mei, Linyan, et al.
Published: (2022)
by: Mei, Linyan, et al.
Published: (2022)
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
by: Zhou, Zhuoshan, et al.
Published: (2026)
by: Zhou, Zhuoshan, et al.
Published: (2026)
Mitigating Shared Storage Congestion Using Control Theory
by: Collignon, Thomas, et al.
Published: (2025)
by: Collignon, Thomas, et al.
Published: (2025)
MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems
by: Hsia, Samuel, et al.
Published: (2023)
by: Hsia, Samuel, et al.
Published: (2023)
NMP-PaK: Near-Memory Processing Acceleration of Scalable De Novo Genome Assembly
by: Kim, Heewoo, et al.
Published: (2025)
by: Kim, Heewoo, et al.
Published: (2025)
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
by: Zou, Cheng, et al.
Published: (2026)
by: Zou, Cheng, et al.
Published: (2026)
Evaluation of POSIT Arithmetic with Accelerators
by: Nakasato, Naohito, et al.
Published: (2024)
by: Nakasato, Naohito, et al.
Published: (2024)
Similar Items
-
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
by: Kubo, Tatsuya, et al.
Published: (2025) -
Kitsune: Enabling Dataflow Execution on GPUs
by: Davies, Michael, et al.
Published: (2025) -
Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A
by: Jarmusch, Aaron, et al.
Published: (2026) -
Achieving Dependability of AI Execution with Radiation Hardened Processors
by: Taquichiri, Carlos Rafael Tordoya, et al.
Published: (2025) -
Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous Systems
by: Bai, Zhenyu, et al.
Published: (2025)