Saved in:
| Main Author: | Nair, Arjun S. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.02609 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
by: Li, Zonghang, et al.
Published: (2025)
by: Li, Zonghang, et al.
Published: (2025)
Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge
by: Khan, Kabir, et al.
Published: (2025)
by: Khan, Kabir, et al.
Published: (2025)
Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting
by: Hallak, Khaled, et al.
Published: (2025)
by: Hallak, Khaled, et al.
Published: (2025)
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
by: Zhang, Lingzhe, et al.
Published: (2025)
by: Zhang, Lingzhe, et al.
Published: (2025)
CooperLLM: Cloud-Edge-End Cooperative Federated Fine-tuning for LLMs via ZOO-based Gradient Correction
by: Sun, He, et al.
Published: (2026)
by: Sun, He, et al.
Published: (2026)
DAGER: Exact Gradient Inversion for Large Language Models
by: Petrov, Ivo, et al.
Published: (2024)
by: Petrov, Ivo, et al.
Published: (2024)
FedMon: Federated eBPF Monitoring for Distributed Anomaly Detection in Multi-Cluster Cloud Environments
by: Zehra, Sehar, et al.
Published: (2025)
by: Zehra, Sehar, et al.
Published: (2025)
Scalable Engine and the Performance of Different LLM Models in a SLURM based HPC architecture
by: Luiz, Anderson de Lima, et al.
Published: (2025)
by: Luiz, Anderson de Lima, et al.
Published: (2025)
SPARK: Igniting Communication-Efficient Decentralized Learning via Stage-wise Projected NTK and Accelerated Regularization
by: Xia, Li
Published: (2025)
by: Xia, Li
Published: (2025)
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration
by: Sarker, Yeahia, et al.
Published: (2026)
by: Sarker, Yeahia, et al.
Published: (2026)
ADF-LoRA: Alternating Low-Rank Aggregation for Decentralized Federated Fine-Tuning
by: Wang, Xiaoyu, et al.
Published: (2025)
by: Wang, Xiaoyu, et al.
Published: (2025)
Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling
by: Luo, Zizhang, et al.
Published: (2026)
by: Luo, Zizhang, et al.
Published: (2026)
Flash-Fusion: Enabling Expressive, Low-Latency Queries on IoT Sensor Streams with LLMs
by: Patherya, Kausar, et al.
Published: (2025)
by: Patherya, Kausar, et al.
Published: (2025)
Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project
by: Penke, Carolin, et al.
Published: (2025)
by: Penke, Carolin, et al.
Published: (2025)
Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI
by: Kolluru, Saicharan
Published: (2025)
by: Kolluru, Saicharan
Published: (2025)
AAFLOW: Scalable Patterns for Agentic AI Workflows
by: Sarker, Arup Kumar, et al.
Published: (2026)
by: Sarker, Arup Kumar, et al.
Published: (2026)
Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model
by: Chen, Mu-Chi, et al.
Published: (2025)
by: Chen, Mu-Chi, et al.
Published: (2025)
When Does Global Attention Help? A Unified Empirical Study on Atomistic Graph Learning
by: Chowdhury, Arindam, et al.
Published: (2025)
by: Chowdhury, Arindam, et al.
Published: (2025)
POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
by: Kamath, Aditya K, et al.
Published: (2024)
by: Kamath, Aditya K, et al.
Published: (2024)
AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs
by: Antunes, Pedro, et al.
Published: (2025)
by: Antunes, Pedro, et al.
Published: (2025)
Tram-FL: Routing-based Model Training for Decentralized Federated Learning
by: Maejima, Kota, et al.
Published: (2023)
by: Maejima, Kota, et al.
Published: (2023)
Kant: An Efficient Unified Scheduling System for Large-Scale AI Clusters
by: Zeng, Lingling, et al.
Published: (2025)
by: Zeng, Lingling, et al.
Published: (2025)
Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
by: Georgiou, Athos
Published: (2026)
by: Georgiou, Athos
Published: (2026)
XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML
by: Estevanell-Valladares, Ernesto L., et al.
Published: (2025)
by: Estevanell-Valladares, Ernesto L., et al.
Published: (2025)
Targeted Lexical Injection: Unlocking Latent Cross-Lingual Alignment in Lugha-Llama via Early-Layer LoRA Fine-Tuning
by: Ngugi, Stanley
Published: (2025)
by: Ngugi, Stanley
Published: (2025)
AMP4EC: Adaptive Model Partitioning Framework for Efficient Deep Learning Inference in Edge Computing Environments
by: Zhang, Guilin, et al.
Published: (2025)
by: Zhang, Guilin, et al.
Published: (2025)
Worldwide Federated Training of Language Models
by: Iacob, Alex, et al.
Published: (2024)
by: Iacob, Alex, et al.
Published: (2024)
Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and NVIDIA Data Center GPUs
by: Sada, Mohammad Firas, et al.
Published: (2025)
by: Sada, Mohammad Firas, et al.
Published: (2025)
Augmenting the FedProx Algorithm by Minimizing Convergence
by: Sarkar, Anomitra, et al.
Published: (2024)
by: Sarkar, Anomitra, et al.
Published: (2024)
Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation
by: Mitra, Subhadip
Published: (2026)
by: Mitra, Subhadip
Published: (2026)
MeanCache: User-Centric Semantic Caching for LLM Web Services
by: Gill, Waris, et al.
Published: (2024)
by: Gill, Waris, et al.
Published: (2024)
DSDE: Dynamic Speculative Decoding with KLD Stability for Real-World Serving
by: Yang, Mingyu, et al.
Published: (2025)
by: Yang, Mingyu, et al.
Published: (2025)
ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
by: Liang, Yuhang, et al.
Published: (2024)
by: Liang, Yuhang, et al.
Published: (2024)
Experimentally Evaluating the Resource Efficiency of Big Data Autoscaling
by: Will, Jonathan, et al.
Published: (2025)
by: Will, Jonathan, et al.
Published: (2025)
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
by: Shakerdargah, Mohammadali, et al.
Published: (2024)
by: Shakerdargah, Mohammadali, et al.
Published: (2024)
Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory
by: Jo, Myeong Jun
Published: (2026)
by: Jo, Myeong Jun
Published: (2026)
Fine-tuning of Large Language Models for Constituency Parsing Using a Sequence to Sequence Approach
by: Delgado, Francisco Jose Cortes, et al.
Published: (2025)
by: Delgado, Francisco Jose Cortes, et al.
Published: (2025)
Benchmarking Federated Learning for Throughput Prediction in 5G Live Streaming Applications
by: Dutta, Yuvraj, et al.
Published: (2025)
by: Dutta, Yuvraj, et al.
Published: (2025)
Partitioned Neural Network Training via Synthetic Intermediate Labels
by: Karadağ, Cevat Volkan, et al.
Published: (2024)
by: Karadağ, Cevat Volkan, et al.
Published: (2024)
Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks
by: Topcu, Burak, et al.
Published: (2026)
by: Topcu, Burak, et al.
Published: (2026)
Similar Items
-
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
by: Li, Zonghang, et al.
Published: (2025) -
Parameter-Efficient and Personalized Federated Training of Generative Models at the Edge
by: Khan, Kabir, et al.
Published: (2025) -
Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting
by: Hallak, Khaled, et al.
Published: (2025) -
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
by: Zhang, Lingzhe, et al.
Published: (2025) -
CooperLLM: Cloud-Edge-End Cooperative Federated Fine-tuning for LLMs via ZOO-based Gradient Correction
by: Sun, He, et al.
Published: (2026)