Saved in:
| Main Authors: | Font, Martí Llopart, Hernando, Javier, España-Bonet, Cristina |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.00028 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)
by: Asquini, Lorenzo, et al.
Published: (2025)
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
by: Li, Jonathan, et al.
Published: (2025)
by: Li, Jonathan, et al.
Published: (2025)
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis
by: Ghiasi, Nika Mansouri, et al.
Published: (2025)
by: Ghiasi, Nika Mansouri, et al.
Published: (2025)
Mitigating Shared Storage Congestion Using Control Theory
by: Collignon, Thomas, et al.
Published: (2025)
by: Collignon, Thomas, et al.
Published: (2025)
DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific Packages
by: Kanani, Alish, et al.
Published: (2026)
by: Kanani, Alish, et al.
Published: (2026)
On-Package Memory with Universal Chiplet Interconnect Express (UCIe): A Low Power, High Bandwidth, Low Latency and Low Cost Approach
by: Sharma, Debendra Das, et al.
Published: (2025)
by: Sharma, Debendra Das, et al.
Published: (2025)
HieraSparse: Hierarchical Semi-Structured Sparse KV Attention
by: Wang, Haoxuan, et al.
Published: (2026)
by: Wang, Haoxuan, et al.
Published: (2026)
Part-time Power Measurements: nvidia-smi's Lack of Attention
by: Yang, Zeyu, et al.
Published: (2023)
by: Yang, Zeyu, et al.
Published: (2023)
Knowledge-Guided Attention-Inspired Learning for Task Offloading in Vehicle Edge Computing
by: Ma, Ke, et al.
Published: (2025)
by: Ma, Ke, et al.
Published: (2025)
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
by: Lin, Bin, et al.
Published: (2024)
by: Lin, Bin, et al.
Published: (2024)
FlashMoE: Fast Distributed MoE in a Single Kernel
by: Aimuyo, Osayamen Jonathan, et al.
Published: (2025)
by: Aimuyo, Osayamen Jonathan, et al.
Published: (2025)
Optimizing Task Scheduling in Fog Computing with Deadline Awareness
by: Sirjani, Mohammad Sadegh, et al.
Published: (2025)
by: Sirjani, Mohammad Sadegh, et al.
Published: (2025)
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
by: Adnan, Muhammad, et al.
Published: (2024)
by: Adnan, Muhammad, et al.
Published: (2024)
Simopt-Power: Leveraging Simulation Metadata for Low-Power Design Synthesis
by: Wadhwa, Eashan, et al.
Published: (2025)
by: Wadhwa, Eashan, et al.
Published: (2025)
Enabling Time-Aware Priority Traffic Management over Distributed FPGA Nodes
by: Scionti, Alberto, et al.
Published: (2025)
by: Scionti, Alberto, et al.
Published: (2025)
Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
by: Feng, Dahu, et al.
Published: (2025)
by: Feng, Dahu, et al.
Published: (2025)
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
by: Zhang, Chen, et al.
Published: (2026)
by: Zhang, Chen, et al.
Published: (2026)
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
by: Kubo, Tatsuya, et al.
Published: (2025)
by: Kubo, Tatsuya, et al.
Published: (2025)
Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
by: Tian, Yuyang, et al.
Published: (2025)
by: Tian, Yuyang, et al.
Published: (2025)
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing
by: Kabra, Mayank, et al.
Published: (2025)
by: Kabra, Mayank, et al.
Published: (2025)
Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors
by: Titopoulos, Vasileios, et al.
Published: (2025)
by: Titopoulos, Vasileios, et al.
Published: (2025)
TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-up Cluster Design with High Bandwidth Main Memory Link
by: Zhang, Yichao, et al.
Published: (2026)
by: Zhang, Yichao, et al.
Published: (2026)
PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System
by: Frouzakis, Manos, et al.
Published: (2025)
by: Frouzakis, Manos, et al.
Published: (2025)
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
by: Zhang, Hengrui, et al.
Published: (2025)
by: Zhang, Hengrui, et al.
Published: (2025)
MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing
by: Ghiasi, Nika Mansouri, et al.
Published: (2024)
by: Ghiasi, Nika Mansouri, et al.
Published: (2024)
TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI
by: Kwak, Hyunseok, et al.
Published: (2025)
by: Kwak, Hyunseok, et al.
Published: (2025)
The DEEP-ER project: I/O and resiliency extensions for the Cluster-Booster architecture
by: Kreuzer, Anke, et al.
Published: (2019)
by: Kreuzer, Anke, et al.
Published: (2019)
An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator
by: Qiu, Tong Dong, et al.
Published: (2023)
by: Qiu, Tong Dong, et al.
Published: (2023)
SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding
by: Xu, Weihong, et al.
Published: (2025)
by: Xu, Weihong, et al.
Published: (2025)
COMET: A Framework for Modeling Compound Operation Dataflows with Explicit Collectives
by: Negi, Shubham, et al.
Published: (2025)
by: Negi, Shubham, et al.
Published: (2025)
PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System
by: Liu, Lian, et al.
Published: (2026)
by: Liu, Lian, et al.
Published: (2026)
RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs
by: Chen, Yanru, et al.
Published: (2025)
by: Chen, Yanru, et al.
Published: (2025)
Chopper: A Multi-Level GPU Characterization Tool & Derived Insights Into LLM Training Inefficiency
by: Kurzynski, Marco, et al.
Published: (2025)
by: Kurzynski, Marco, et al.
Published: (2025)
Efficient deadlock avoidance for 2D mesh NoCs that use OQ or VOQ routers
by: Papaphilippou, Philippos, et al.
Published: (2023)
by: Papaphilippou, Philippos, et al.
Published: (2023)
DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications
by: Orenes-Vera, Marcelo, et al.
Published: (2023)
by: Orenes-Vera, Marcelo, et al.
Published: (2023)
LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores
by: García-García, Adrián, et al.
Published: (2024)
by: García-García, Adrián, et al.
Published: (2024)
FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs
by: Li, Bohan, et al.
Published: (2026)
by: Li, Bohan, et al.
Published: (2026)
iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience
by: Muntaka, Siddique Abubakr, et al.
Published: (2026)
by: Muntaka, Siddique Abubakr, et al.
Published: (2026)
NetSmith: An Optimization Framework for Machine-Discovered Network Topologies
by: Green, Conor, et al.
Published: (2024)
by: Green, Conor, et al.
Published: (2024)
SpArch: Efficient Architecture for Sparse Matrix Multiplication
by: Zhang, Zhekai, et al.
Published: (2020)
by: Zhang, Zhekai, et al.
Published: (2020)
Similar Items
-
Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025) -
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators
by: Li, Jonathan, et al.
Published: (2025) -
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis
by: Ghiasi, Nika Mansouri, et al.
Published: (2025) -
Mitigating Shared Storage Congestion Using Control Theory
by: Collignon, Thomas, et al.
Published: (2025) -
DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific Packages
by: Kanani, Alish, et al.
Published: (2026)