Saved in:
| Main Authors: | Wu, Zihan, Huang, Zhaoke, Yan, Hong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.18113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains
by: Eduardo, González Trigueros Jesús, et al.
Published: (2025)
by: Eduardo, González Trigueros Jesús, et al.
Published: (2025)
Heuristic Search Space Partitioning for Low-Latency Multi-Tenant Cloud Queries
by: Pathak, Prashant Kumar, et al.
Published: (2026)
by: Pathak, Prashant Kumar, et al.
Published: (2026)
Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation
by: Mitra, Subhadip
Published: (2026)
by: Mitra, Subhadip
Published: (2026)
Scalability Optimization in Cloud-Based AI Inference Services: Strategies for Real-Time Load Balancing and Automated Scaling
by: Jin, Yihong, et al.
Published: (2025)
by: Jin, Yihong, et al.
Published: (2025)
Cost-Aware Logging: Measuring the Financial Impact of Excessive Log Retention in Small-Scale Cloud Deployments
by: Putra, Jody Almaida
Published: (2026)
by: Putra, Jody Almaida
Published: (2026)
A Semantic Partitioning Method for Large-Scale Training of Knowledge Graph Embeddings
by: Bai, Yuhe
Published: (2025)
by: Bai, Yuhe
Published: (2025)
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
by: Kokolis, Apostolos, et al.
Published: (2024)
by: Kokolis, Apostolos, et al.
Published: (2024)
Learning Interpretable Scheduling Algorithms for Data Processing Clusters
by: Hu, Zhibo, et al.
Published: (2024)
by: Hu, Zhibo, et al.
Published: (2024)
Token Arena: A Continuous Benchmark Unifying Energy and Cognition in AI Inference
by: Gao, Yuxuan, et al.
Published: (2026)
by: Gao, Yuxuan, et al.
Published: (2026)
AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training
by: Bai, Huawei, et al.
Published: (2025)
by: Bai, Huawei, et al.
Published: (2025)
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
by: Du, Zhixu, et al.
Published: (2023)
by: Du, Zhixu, et al.
Published: (2023)
Deploy, Calibrate, Monitor, Heal -- No Human Required: An Autonomous AI SRE Agent for Elasticsearch
by: Mukkolakkal, Muhamed Ramees Cheriya
Published: (2026)
by: Mukkolakkal, Muhamed Ramees Cheriya
Published: (2026)
Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data
by: Huang, Shuo-Chieh, et al.
Published: (2023)
by: Huang, Shuo-Chieh, et al.
Published: (2023)
Arena: Efficiently Training Large Models via Dynamic Scheduling and Adaptive Parallelism Co-Design
by: Xue, Chunyu, et al.
Published: (2024)
by: Xue, Chunyu, et al.
Published: (2024)
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
by: Won, William, et al.
Published: (2023)
by: Won, William, et al.
Published: (2023)
Learning the Optimal Path and DNN Partition for Collaborative Edge Inference
by: Huang, Yin, et al.
Published: (2024)
by: Huang, Yin, et al.
Published: (2024)
Decoupled Vertical Federated Learning for Practical Training on Vertically Partitioned Data
by: Amalanshu, Avi, et al.
Published: (2024)
by: Amalanshu, Avi, et al.
Published: (2024)
Efficient Construction of Large Search Spaces for Auto-Tuning
by: Willemsen, Floris-Jan, et al.
Published: (2025)
by: Willemsen, Floris-Jan, et al.
Published: (2025)
EmbedPart: Embedding-Driven Graph Partitioning for Scalable Graph Neural Network Training
by: Merkel, Nikolai, et al.
Published: (2026)
by: Merkel, Nikolai, et al.
Published: (2026)
EncCluster: Scalable Functional Encryption in Federated Learning through Weight Clustering and Probabilistic Filters
by: Tsouvalas, Vasileios, et al.
Published: (2024)
by: Tsouvalas, Vasileios, et al.
Published: (2024)
Rethinking Personalized Federated Learning with Clustering-based Dynamic Graph Propagation
by: Wang, Jiaqi, et al.
Published: (2024)
by: Wang, Jiaqi, et al.
Published: (2024)
FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering
by: Islam, Md Sirajul, et al.
Published: (2024)
by: Islam, Md Sirajul, et al.
Published: (2024)
Cross-Silo Federated Learning for Multi-Tier Networks with Vertical and Horizontal Data Partitioning
by: Das, Anirban, et al.
Published: (2021)
by: Das, Anirban, et al.
Published: (2021)
Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning
by: Yu, Chong, et al.
Published: (2024)
by: Yu, Chong, et al.
Published: (2024)
SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data
by: Kapadia, Shashank, et al.
Published: (2026)
by: Kapadia, Shashank, et al.
Published: (2026)
Large-Scale Graph Building in Dynamic Environments: Low Latency and High Quality
by: de Almeida, Filipe Miguel Gonçalves, et al.
Published: (2025)
by: de Almeida, Filipe Miguel Gonçalves, et al.
Published: (2025)
Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training
by: Liu, Guanliang, et al.
Published: (2026)
by: Liu, Guanliang, et al.
Published: (2026)
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems
by: Woisetschläger, Herbert, et al.
Published: (2023)
by: Woisetschläger, Herbert, et al.
Published: (2023)
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
by: Zhao, Juntao, et al.
Published: (2024)
by: Zhao, Juntao, et al.
Published: (2024)
Flexible Clustered Federated Learning for Client-Level Data Distribution Shift
by: Duan, Moming, et al.
Published: (2021)
by: Duan, Moming, et al.
Published: (2021)
N2N: A Parallel Framework for Large-Scale MILP under Distributed Memory
by: Wang, Longfei, et al.
Published: (2025)
by: Wang, Longfei, et al.
Published: (2025)
Operational Memory Architecture for Kubernetes:Preserving Causal Context Across the Evidence Horizon
by: Khan, Shamsher
Published: (2026)
by: Khan, Shamsher
Published: (2026)
A Semi-Supervised Federated Learning Framework with Hierarchical Clustering Aggregation for Heterogeneous Satellite Networks
by: Liu, Zhuocheng, et al.
Published: (2025)
by: Liu, Zhuocheng, et al.
Published: (2025)
CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
by: Jin, Hongpeng, et al.
Published: (2024)
by: Jin, Hongpeng, et al.
Published: (2024)
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
by: Wang, Weixun, et al.
Published: (2025)
by: Wang, Weixun, et al.
Published: (2025)
AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training
by: Chen, Ling, et al.
Published: (2026)
by: Chen, Ling, et al.
Published: (2026)
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
by: Jiang, Ziheng, et al.
Published: (2024)
by: Jiang, Ziheng, et al.
Published: (2024)
Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach
by: Saroliya, Urvij, et al.
Published: (2024)
by: Saroliya, Urvij, et al.
Published: (2024)
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
by: Cai, Weilin, et al.
Published: (2025)
by: Cai, Weilin, et al.
Published: (2025)
Hydraulis: Balancing Large Transformer Model Training via Co-designing Parallel Strategies and Data Assignment
by: Li, Haoyang, et al.
Published: (2024)
by: Li, Haoyang, et al.
Published: (2024)
Similar Items
-
Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains
by: Eduardo, González Trigueros Jesús, et al.
Published: (2025) -
Heuristic Search Space Partitioning for Low-Latency Multi-Tenant Cloud Queries
by: Pathak, Prashant Kumar, et al.
Published: (2026) -
Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation
by: Mitra, Subhadip
Published: (2026) -
Scalability Optimization in Cloud-Based AI Inference Services: Strategies for Real-Time Load Balancing and Automated Scaling
by: Jin, Yihong, et al.
Published: (2025) -
Cost-Aware Logging: Measuring the Financial Impact of Excessive Log Retention in Small-Scale Cloud Deployments
by: Putra, Jody Almaida
Published: (2026)