Saved in:
| Main Authors: | Morgado, José, Sousa, Leonel, Ilic, Aleksandar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29740 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
by: Pacheco, Daniel, et al.
Published: (2026)
by: Pacheco, Daniel, et al.
Published: (2026)
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication
by: Qian, Matthew, et al.
Published: (2026)
by: Qian, Matthew, et al.
Published: (2026)
TrioSeq: A Novel Approach to Accelerate Triplet Sequence Alignment on GPUs
by: Graça, Miguel, et al.
Published: (2026)
by: Graça, Miguel, et al.
Published: (2026)
Analytic Roofline Modeling and Energy Analysis of LULESH Proxy Application on Multi-Core Clusters
by: Afzal, Ayesha, et al.
Published: (2024)
by: Afzal, Ayesha, et al.
Published: (2024)
Ridgeline: A 2D Roofline Model for Distributed Systems
by: Checconi, Fabio, et al.
Published: (2022)
by: Checconi, Fabio, et al.
Published: (2022)
Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2025)
by: K., Prashanthi S., et al.
Published: (2025)
QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration
by: Kumar, Satyam, et al.
Published: (2026)
by: Kumar, Satyam, et al.
Published: (2026)
PerCache: Predictive Hierarchical Cache for RAG Applications on Mobile Devices
by: Liu, Kaiwei, et al.
Published: (2025)
by: Liu, Kaiwei, et al.
Published: (2025)
Run-time application migration using checkpoint/restore in userspace
by: Tošić, Aleksandar
Published: (2023)
by: Tošić, Aleksandar
Published: (2023)
FLeeC: a Fast Lock-Free Application Cache
by: Costa, André J., et al.
Published: (2024)
by: Costa, André J., et al.
Published: (2024)
CacheFL: Privacy-Preserving and Efficient Federated Cache Model Fine-Tuning for Vision-Language Models
by: Yi, Mengjun, et al.
Published: (2025)
by: Yi, Mengjun, et al.
Published: (2025)
LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching
by: Singh, Simranjit, et al.
Published: (2024)
by: Singh, Simranjit, et al.
Published: (2024)
Experimental Analysis of Server-Side Caching for Web Performance
by: Umar, Mohammad, et al.
Published: (2026)
by: Umar, Mohammad, et al.
Published: (2026)
Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
by: Tian, Yuyang, et al.
Published: (2025)
by: Tian, Yuyang, et al.
Published: (2025)
Parallel Spawning Strategies for Dynamic-Aware MPI Applications
by: Martín-Álvarez, Iker, et al.
Published: (2025)
by: Martín-Álvarez, Iker, et al.
Published: (2025)
Comparative Analysis of Distributed Caching Algorithms: Performance Metrics and Implementation Considerations
by: Mayer, Helen, et al.
Published: (2025)
by: Mayer, Helen, et al.
Published: (2025)
Adaptive K-PackCache: Cost-Centric Data Caching in Cloud
by: Sarkar, Suvarthi, et al.
Published: (2025)
by: Sarkar, Suvarthi, et al.
Published: (2025)
Coherence-Aware Task Graph Modeling for Realistic Application
by: Xiong, Guochu, et al.
Published: (2025)
by: Xiong, Guochu, et al.
Published: (2025)
10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training
by: Afroz, Sabiha, et al.
Published: (2025)
by: Afroz, Sabiha, et al.
Published: (2025)
CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration
by: Nian, Sean, et al.
Published: (2026)
by: Nian, Sean, et al.
Published: (2026)
CaPGNN: Optimizing Parallel Graph Neural Network Training with Joint Caching and Resource-Aware Graph Partitioning
by: Song, Xianfeng, et al.
Published: (2025)
by: Song, Xianfeng, et al.
Published: (2025)
FedCache: A Knowledge Cache-driven Federated Learning Architecture for Personalized Edge Intelligence
by: Wu, Zhiyuan, et al.
Published: (2023)
by: Wu, Zhiyuan, et al.
Published: (2023)
Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches
by: Fang, Shaoke, et al.
Published: (2026)
by: Fang, Shaoke, et al.
Published: (2026)
Kavier: Exploring Performance, Sustainability, and Efficiency of LLM Ecosystems under Inference through Cache-Aware Discrete-Event Simulation
by: Nicolae, Radu, et al.
Published: (2026)
by: Nicolae, Radu, et al.
Published: (2026)
Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via Semantic-Aware Knowledge Caching
by: Ruan, Chaoyi, et al.
Published: (2025)
by: Ruan, Chaoyi, et al.
Published: (2025)
Strata: Hierarchical Context Caching for Long Context Language Model Serving
by: Xie, Zhiqiang, et al.
Published: (2025)
by: Xie, Zhiqiang, et al.
Published: (2025)
Caching Aided Multi-Tenant Serverless Computing
by: Qiao, Chu, et al.
Published: (2024)
by: Qiao, Chu, et al.
Published: (2024)
THEAS: Efficient Power Management in Multi-Core CPUs via Cache-Aware Resource Scheduling
by: Muhammad, Said, et al.
Published: (2025)
by: Muhammad, Said, et al.
Published: (2025)
Efficient LLM Inference with Activation Checkpointing and Hybrid Caching
by: Lee, Sanghyeon, et al.
Published: (2025)
by: Lee, Sanghyeon, et al.
Published: (2025)
Galvatron: Automatic Distributed Training for Large Transformer Models
by: Gumaan, Esmail
Published: (2025)
by: Gumaan, Esmail
Published: (2025)
A Review of Ontology-Driven Big Data Analytics in Healthcare: Challenges, Tools, and Applications
by: Chandra, Ritesh, et al.
Published: (2025)
by: Chandra, Ritesh, et al.
Published: (2025)
Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management
by: Qianli, Liu, et al.
Published: (2025)
by: Qianli, Liu, et al.
Published: (2025)
Benchmarking Compound AI Applications for Hardware-Software Co-Design
by: Samuthrsindh, Paramuth, et al.
Published: (2026)
by: Samuthrsindh, Paramuth, et al.
Published: (2026)
Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe
by: Rae, Christopher, et al.
Published: (2024)
by: Rae, Christopher, et al.
Published: (2024)
Increasing Efficiency and Result Reliability of Continuous Benchmarking for FaaS Applications
by: Rese, Tim C., et al.
Published: (2024)
by: Rese, Tim C., et al.
Published: (2024)
InstCache: A Predictive Cache for LLM Serving
by: Zou, Longwei, et al.
Published: (2024)
by: Zou, Longwei, et al.
Published: (2024)
KV Cache Compression for Inference Efficiency in LLMs: A Review
by: Liu, Yanyu, et al.
Published: (2025)
by: Liu, Yanyu, et al.
Published: (2025)
A Comparative Evaluation of Automated Analysis Tools for Solidity Smart Contracts
by: Wei, Zhiyuan, et al.
Published: (2023)
by: Wei, Zhiyuan, et al.
Published: (2023)
The Impact of Process Competition on Energy Consumption: Analysis and Modeling
by: Campos, Eduardo Gomes, et al.
Published: (2026)
by: Campos, Eduardo Gomes, et al.
Published: (2026)
LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications
by: Zhu, Botao, et al.
Published: (2025)
by: Zhu, Botao, et al.
Published: (2025)
Similar Items
-
PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
by: Pacheco, Daniel, et al.
Published: (2026) -
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication
by: Qian, Matthew, et al.
Published: (2026) -
TrioSeq: A Novel Approach to Accelerate Triplet Sequence Alignment on GPUs
by: Graça, Miguel, et al.
Published: (2026) -
Analytic Roofline Modeling and Energy Analysis of LULESH Proxy Application on Multi-Core Clusters
by: Afzal, Ayesha, et al.
Published: (2024) -
Ridgeline: A 2D Roofline Model for Distributed Systems
by: Checconi, Fabio, et al.
Published: (2022)