:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Nair, Arjun S.
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Computation and Language Distributed, Parallel, and Cluster Computing 68T05 I.2.6; I.2.7
Online Access:	https://arxiv.org/abs/2601.02609
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
by: Li, Zonghang, et al.
Published: (2025)

Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge
by: Khan, Kabir, et al.
Published: (2025)

Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting
by: Hallak, Khaled, et al.
Published: (2025)

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
by: Zhang, Lingzhe, et al.
Published: (2025)

CooperLLM: Cloud-Edge-End Cooperative Federated Fine-tuning for LLMs via ZOO-based Gradient Correction
by: Sun, He, et al.
Published: (2026)

DAGER: Exact Gradient Inversion for Large Language Models
by: Petrov, Ivo, et al.
Published: (2024)

FedMon: Federated eBPF Monitoring for Distributed Anomaly Detection in Multi-Cluster Cloud Environments
by: Zehra, Sehar, et al.
Published: (2025)

Scalable Engine and the Performance of Different LLM Models in a SLURM based HPC architecture
by: Luiz, Anderson de Lima, et al.
Published: (2025)

SPARK: Igniting Communication-Efficient Decentralized Learning via Stage-wise Projected NTK and Accelerated Regularization
by: Xia, Li
Published: (2025)

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration
by: Sarker, Yeahia, et al.
Published: (2026)

ADF-LoRA: Alternating Low-Rank Aggregation for Decentralized Federated Fine-Tuning
by: Wang, Xiaoyu, et al.
Published: (2025)

Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling
by: Luo, Zizhang, et al.
Published: (2026)

Flash-Fusion: Enabling Expressive, Low-Latency Queries on IoT Sensor Streams with LLMs
by: Patherya, Kausar, et al.
Published: (2025)

Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project
by: Penke, Carolin, et al.
Published: (2025)

Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI
by: Kolluru, Saicharan
Published: (2025)

AAFLOW: Scalable Patterns for Agentic AI Workflows
by: Sarker, Arup Kumar, et al.
Published: (2026)

Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model
by: Chen, Mu-Chi, et al.
Published: (2025)

When Does Global Attention Help? A Unified Empirical Study on Atomistic Graph Learning
by: Chowdhury, Arindam, et al.
Published: (2025)

POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
by: Kamath, Aditya K, et al.
Published: (2024)

AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs
by: Antunes, Pedro, et al.
Published: (2025)

Tram-FL: Routing-based Model Training for Decentralized Federated Learning
by: Maejima, Kota, et al.
Published: (2023)

Kant: An Efficient Unified Scheduling System for Large-Scale AI Clusters
by: Zeng, Lingling, et al.
Published: (2025)

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
by: Georgiou, Athos
Published: (2026)

XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML
by: Estevanell-Valladares, Ernesto L., et al.
Published: (2025)

Targeted Lexical Injection: Unlocking Latent Cross-Lingual Alignment in Lugha-Llama via Early-Layer LoRA Fine-Tuning
by: Ngugi, Stanley
Published: (2025)

AMP4EC: Adaptive Model Partitioning Framework for Efficient Deep Learning Inference in Edge Computing Environments
by: Zhang, Guilin, et al.
Published: (2025)

Worldwide Federated Training of Language Models
by: Iacob, Alex, et al.
Published: (2024)

Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and NVIDIA Data Center GPUs
by: Sada, Mohammad Firas, et al.
Published: (2025)

Augmenting the FedProx Algorithm by Minimizing Convergence
by: Sarkar, Anomitra, et al.
Published: (2024)

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation
by: Mitra, Subhadip
Published: (2026)

MeanCache: User-Centric Semantic Caching for LLM Web Services
by: Gill, Waris, et al.
Published: (2024)

DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
by: Yang, Mingyu, et al.
Published: (2025)

ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
by: Liang, Yuhang, et al.
Published: (2024)

Experimentally Evaluating the Resource Efficiency of Big Data Autoscaling
by: Will, Jonathan, et al.
Published: (2025)

MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
by: Shakerdargah, Mohammadali, et al.
Published: (2024)

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory
by: Jo, Myeong Jun
Published: (2026)

Fine-tuning of Large Language Models for Constituency Parsing Using a Sequence to Sequence Approach
by: Delgado, Francisco Jose Cortes, et al.
Published: (2025)

Benchmarking Federated Learning for Throughput Prediction in 5G Live Streaming Applications
by: Dutta, Yuvraj, et al.
Published: (2025)

Partitioned Neural Network Training via Synthetic Intermediate Labels
by: Karadağ, Cevat Volkan, et al.
Published: (2024)

Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks
by: Topcu, Burak, et al.
Published: (2026)