:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Font, Martí Llopart, Hernando, Javier, España-Bonet, Cristina
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2604.00028
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
by: Li, Jonathan, et al.
Published: (2025)

SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis
by: Ghiasi, Nika Mansouri, et al.
Published: (2025)

Mitigating Shared Storage Congestion Using Control Theory
by: Collignon, Thomas, et al.
Published: (2025)

DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific Packages
by: Kanani, Alish, et al.
Published: (2026)

On-Package Memory with Universal Chiplet Interconnect Express (UCIe): A Low Power, High Bandwidth, Low Latency and Low Cost Approach
by: Sharma, Debendra Das, et al.
Published: (2025)

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention
by: Wang, Haoxuan, et al.
Published: (2026)

Part-time Power Measurements: nvidia-smi's Lack of Attention
by: Yang, Zeyu, et al.
Published: (2023)

Knowledge-Guided Attention-Inspired Learning for Task Offloading in Vehicle Edge Computing
by: Ma, Ke, et al.
Published: (2025)

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
by: Lin, Bin, et al.
Published: (2024)

FlashMoE: Fast Distributed MoE in a Single Kernel
by: Aimuyo, Osayamen Jonathan, et al.
Published: (2025)

Optimizing Task Scheduling in Fog Computing with Deadline Awareness
by: Sirjani, Mohammad Sadegh, et al.
Published: (2025)

Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
by: Adnan, Muhammad, et al.
Published: (2024)

Simopt-Power: Leveraging Simulation Metadata for Low-Power Design Synthesis
by: Wadhwa, Eashan, et al.
Published: (2025)

Enabling Time-Aware Priority Traffic Management over Distributed FPGA Nodes
by: Scionti, Alberto, et al.
Published: (2025)

Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
by: Feng, Dahu, et al.
Published: (2025)

Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
by: Zhang, Chen, et al.
Published: (2026)

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
by: Kubo, Tatsuya, et al.
Published: (2025)

Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
by: Tian, Yuyang, et al.
Published: (2025)

CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing
by: Kabra, Mayank, et al.
Published: (2025)

Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors
by: Titopoulos, Vasileios, et al.
Published: (2025)

TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-up Cluster Design with High Bandwidth Main Memory Link
by: Zhang, Yichao, et al.
Published: (2026)

PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System
by: Frouzakis, Manos, et al.
Published: (2025)

SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
by: Zhang, Hengrui, et al.
Published: (2025)

MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing
by: Ghiasi, Nika Mansouri, et al.
Published: (2024)

TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI
by: Kwak, Hyunseok, et al.
Published: (2025)

The DEEP-ER project: I/O and resiliency extensions for the Cluster-Booster architecture
by: Kreuzer, Anke, et al.
Published: (2019)

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator
by: Qiu, Tong Dong, et al.
Published: (2023)

SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding
by: Xu, Weihong, et al.
Published: (2025)

COMET: A Framework for Modeling Compound Operation Dataflows with Explicit Collectives
by: Negi, Shubham, et al.
Published: (2025)

PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System
by: Liu, Lian, et al.
Published: (2026)

RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs
by: Chen, Yanru, et al.
Published: (2025)

Chopper: A Multi-Level GPU Characterization Tool & Derived Insights Into LLM Training Inefficiency
by: Kurzynski, Marco, et al.
Published: (2025)

Efficient deadlock avoidance for 2D mesh NoCs that use OQ or VOQ routers
by: Papaphilippou, Philippos, et al.
Published: (2023)

DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications
by: Orenes-Vera, Marcelo, et al.
Published: (2023)

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores
by: García-García, Adrián, et al.
Published: (2024)

FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs
by: Li, Bohan, et al.
Published: (2026)

iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience
by: Muntaka, Siddique Abubakr, et al.
Published: (2026)

NetSmith: An Optimization Framework for Machine-Discovered Network Topologies
by: Green, Conor, et al.
Published: (2024)

SpArch: Efficient Architecture for Sparse Matrix Multiplication
by: Zhang, Zhekai, et al.
Published: (2020)