:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Fu, Yao
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computation and Language Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2405.08944
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Deployment of Large Language Models on Resource-constrained Devices
by: Yao, Zhiwei, et al.
Published: (2025)

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
by: Yao, Jinghan, et al.
Published: (2024)

LongCat-Flash Technical Report
by: Meituan LongCat Team, et al.
Published: (2025)

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
by: Xue, Fuzhao, et al.
Published: (2024)

Deploying Atmospheric and Oceanic AI Models on Chinese Hardware and Framework: Migration Strategies, Performance Optimization and Analysis
by: Sun, Yuze, et al.
Published: (2025)

MACE: A Hybrid LLM Serving System with Colocated SLO-aware Continuous Retraining Alignment
by: Li, Yufei, et al.
Published: (2025)

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
by: Yang, Shang, et al.
Published: (2025)

Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems
by: Agrawal, Amey, et al.
Published: (2024)

Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning
by: Singhal, Raghav, et al.
Published: (2025)

Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
by: Wei, Anjiang, et al.
Published: (2024)

ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
by: Mei, Zhiyu, et al.
Published: (2024)

FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing
by: Liu, Xiao-Yang, et al.
Published: (2024)

Optimizing the Deployment of Tiny Transformers on Low-Power MCUs
by: Jung, Victor J. B., et al.
Published: (2024)

From Centralized to Decentralized Federated Learning: Theoretical Insights, Privacy Preservation, and Robustness Challenges
by: Li, Qiongxiu, et al.
Published: (2025)

Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference
by: Deshmukh, Dhruv, et al.
Published: (2025)

Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques
by: Behera, Adarsh Prasad, et al.
Published: (2025)

Context-Aware Inference via Performance Forecasting in Decentralized Learning Networks
by: Pfeffer, Joel, et al.
Published: (2025)

Pre-Deployment Complexity Estimation for Federated Perception Systems
by: Solaiman, KMA, et al.
Published: (2026)

RL in the Wild: Characterizing RLVR Training in LLM Deployment
by: Zhou, Jiecheng, et al.
Published: (2025)

TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training
by: Wu, Houming, et al.
Published: (2025)

MineDraft: A Framework for Batch Parallel Speculative Decoding
by: Tang, Zhenwei, et al.
Published: (2026)

Taming the Titans: A Survey of Efficient LLM Inference Serving
by: Zhen, Ranran, et al.
Published: (2025)

PithTrain: A Compact and Agent-Native MoE Training System
by: Lai, Ruihang, et al.
Published: (2026)

A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering
by: Dhar, Nobel, et al.
Published: (2025)

A Structure-Agnostic Co-Tuning Framework for LLMs and SLMs in Cloud-Edge Systems
by: Liu, Yuze, et al.
Published: (2025)

Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
by: Zhang, Mohan, et al.
Published: (2025)

LongCat-Flash-Omni Technical Report
by: Meituan LongCat Team, et al.
Published: (2025)

Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
by: Brüel-Gabrielsson, Rickard, et al.
Published: (2024)

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
by: Zheng, Zhen, et al.
Published: (2024)

SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
by: Oliaro, Gabriele, et al.
Published: (2024)

DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model
by: Gao, Chao, et al.
Published: (2024)

Liger Kernel: Efficient Triton Kernels for LLM Training
by: Hsu, Pin-Lun, et al.
Published: (2024)

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference
by: Li, Baolin, et al.
Published: (2024)

Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models
by: Liu, Ji, et al.
Published: (2024)

TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training
by: Liang, Wanchao, et al.
Published: (2024)

BlackMamba: Mixture of Experts for State-Space Models
by: Anthony, Quentin, et al.
Published: (2024)

End-Cloud Collaboration Framework for Advanced AI Customer Service in E-commerce
by: Teng, Liangyu, et al.
Published: (2024)

Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
by: Timor, Nadav, et al.
Published: (2024)

Learned Best-Effort LLM Serving
by: Jha, Siddharth, et al.
Published: (2024)

Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale
by: Ku, Jerome, et al.
Published: (2025)