:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Morgado, José, Sousa, Leonel, Ilic, Aleksandar
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2605.29740
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
by: Pacheco, Daniel, et al.
Published: (2026)

Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication
by: Qian, Matthew, et al.
Published: (2026)

TrioSeq: A Novel Approach to Accelerate Triplet Sequence Alignment on GPUs
by: Graça, Miguel, et al.
Published: (2026)

Analytic Roofline Modeling and Energy Analysis of LULESH Proxy Application on Multi-Core Clusters
by: Afzal, Ayesha, et al.
Published: (2024)

Ridgeline: A 2D Roofline Model for Distributed Systems
by: Checconi, Fabio, et al.
Published: (2022)

Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2025)

QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration
by: Kumar, Satyam, et al.
Published: (2026)

PerCache: Predictive Hierarchical Cache for RAG Applications on Mobile Devices
by: Liu, Kaiwei, et al.
Published: (2025)

Run-time application migration using checkpoint/restore in userspace
by: Tošić, Aleksandar
Published: (2023)

FLeeC: a Fast Lock-Free Application Cache
by: Costa, André J., et al.
Published: (2024)

CacheFL: Privacy-Preserving and Efficient Federated Cache Model Fine-Tuning for Vision-Language Models
by: Yi, Mengjun, et al.
Published: (2025)

LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching
by: Singh, Simranjit, et al.
Published: (2024)

Experimental Analysis of Server-Side Caching for Web Performance
by: Umar, Mohammad, et al.
Published: (2026)

Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
by: Tian, Yuyang, et al.
Published: (2025)

Parallel Spawning Strategies for Dynamic-Aware MPI Applications
by: Martín-Álvarez, Iker, et al.
Published: (2025)

Comparative Analysis of Distributed Caching Algorithms: Performance Metrics and Implementation Considerations
by: Mayer, Helen, et al.
Published: (2025)

Adaptive K-PackCache: Cost-Centric Data Caching in Cloud
by: Sarkar, Suvarthi, et al.
Published: (2025)

Coherence-Aware Task Graph Modeling for Realistic Application
by: Xiong, Guochu, et al.
Published: (2025)

10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training
by: Afroz, Sabiha, et al.
Published: (2025)

CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration
by: Nian, Sean, et al.
Published: (2026)

CaPGNN: Optimizing Parallel Graph Neural Network Training with Joint Caching and Resource-Aware Graph Partitioning
by: Song, Xianfeng, et al.
Published: (2025)

FedCache: A Knowledge Cache-driven Federated Learning Architecture for Personalized Edge Intelligence
by: Wu, Zhiyuan, et al.
Published: (2023)

Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches
by: Fang, Shaoke, et al.
Published: (2026)

Kavier: Exploring Performance, Sustainability, and Efficiency of LLM Ecosystems under Inference through Cache-Aware Discrete-Event Simulation
by: Nicolae, Radu, et al.
Published: (2026)

Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via Semantic-Aware Knowledge Caching
by: Ruan, Chaoyi, et al.
Published: (2025)

Strata: Hierarchical Context Caching for Long Context Language Model Serving
by: Xie, Zhiqiang, et al.
Published: (2025)

Caching Aided Multi-Tenant Serverless Computing
by: Qiao, Chu, et al.
Published: (2024)

THEAS: Efficient Power Management in Multi-Core CPUs via Cache-Aware Resource Scheduling
by: Muhammad, Said, et al.
Published: (2025)

Efficient LLM Inference with Activation Checkpointing and Hybrid Caching
by: Lee, Sanghyeon, et al.
Published: (2025)

Galvatron: Automatic Distributed Training for Large Transformer Models
by: Gumaan, Esmail
Published: (2025)

A Review of Ontology-Driven Big Data Analytics in Healthcare: Challenges, Tools, and Applications
by: Chandra, Ritesh, et al.
Published: (2025)

Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management
by: Qianli, Liu, et al.
Published: (2025)

Benchmarking Compound AI Applications for Hardware-Software Co-Design
by: Samuthrsindh, Paramuth, et al.
Published: (2026)

Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe
by: Rae, Christopher, et al.
Published: (2024)

Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications
by: Rese, Tim C., et al.
Published: (2024)

InstCache: A Predictive Cache for LLM Serving
by: Zou, Longwei, et al.
Published: (2024)

KV Cache Compression for Inference Efficiency in LLMs: A Review
by: Liu, Yanyu, et al.
Published: (2025)

A Comparative Evaluation of Automated Analysis Tools for Solidity Smart Contracts
by: Wei, Zhiyuan, et al.
Published: (2023)

The Impact of Process Competition on Energy Consumption: Analysis and Modeling
by: Campos, Eduardo Gomes, et al.
Published: (2026)

LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications
by: Zhu, Botao, et al.
Published: (2025)