Saved in:
| Main Authors: | Ghadia, Ravi, Abraham, Maksim, Vorobyov, Sergei, Ryabinin, Max |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.21196 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
by: Zhao, Xuanlei, et al.
Published: (2024)
by: Zhao, Xuanlei, et al.
Published: (2024)
DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism
by: Jiang, Chenyu, et al.
Published: (2025)
by: Jiang, Chenyu, et al.
Published: (2025)
ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference
by: Meng, Han, et al.
Published: (2026)
by: Meng, Han, et al.
Published: (2026)
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
by: Wu, Bingyang, et al.
Published: (2024)
by: Wu, Bingyang, et al.
Published: (2024)
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
by: Wang, Yujie, et al.
Published: (2023)
by: Wang, Yujie, et al.
Published: (2023)
Memory and Bandwidth are All You Need for Fully Sharded Data Parallel
by: Wang, Jiangtao, et al.
Published: (2025)
by: Wang, Jiangtao, et al.
Published: (2025)
Pipeline Parallelism with Controllable Memory
by: Qi, Penghui, et al.
Published: (2024)
by: Qi, Penghui, et al.
Published: (2024)
HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism
by: Zhang, Geng, et al.
Published: (2025)
by: Zhang, Geng, et al.
Published: (2025)
Arena: Efficiently Training Large Models via Dynamic Scheduling and Adaptive Parallelism Co-Design
by: Xue, Chunyu, et al.
Published: (2024)
by: Xue, Chunyu, et al.
Published: (2024)
DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction
by: Zhang, Yanqi, et al.
Published: (2024)
by: Zhang, Yanqi, et al.
Published: (2024)
Accelerating Large Language Model Training with 4D Parallelism and Memory Consumption Estimator
by: Fujii, Kazuki, et al.
Published: (2024)
by: Fujii, Kazuki, et al.
Published: (2024)
PaSE: Parallelization Strategies for Efficient DNN Training
by: Elango, Venmugil
Published: (2024)
by: Elango, Venmugil
Published: (2024)
DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism
by: Niu, Yifan, et al.
Published: (2026)
by: Niu, Yifan, et al.
Published: (2026)
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
by: Hagemann, Johannes, et al.
Published: (2023)
by: Hagemann, Johannes, et al.
Published: (2023)
Efficient Parallel Reinforcement Learning Framework using the Reactor Model
by: Kwok, Jacky, et al.
Published: (2023)
by: Kwok, Jacky, et al.
Published: (2023)
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
by: Prabhakar, Rohan Baskar, et al.
Published: (2024)
by: Prabhakar, Rohan Baskar, et al.
Published: (2024)
Efficient Long Context Fine-tuning with Chunk Flow
by: Yuan, Xiulong, et al.
Published: (2025)
by: Yuan, Xiulong, et al.
Published: (2025)
Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification
by: Huang, Guang, et al.
Published: (2026)
by: Huang, Guang, et al.
Published: (2026)
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
by: Liu, Dennis, et al.
Published: (2025)
by: Liu, Dennis, et al.
Published: (2025)
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism
by: Liu, Zedong, et al.
Published: (2025)
by: Liu, Zedong, et al.
Published: (2025)
SAIR: Cost-Efficient Multi-Stage ML Pipeline Autoscaling via In-Context Reinforcement Learning
by: Su, Jianchang, et al.
Published: (2026)
by: Su, Jianchang, et al.
Published: (2026)
Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
by: Barrak, Amine, et al.
Published: (2025)
by: Barrak, Amine, et al.
Published: (2025)
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism
by: Ma, Bin, et al.
Published: (2026)
by: Ma, Bin, et al.
Published: (2026)
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
by: Huang, Mincong, et al.
Published: (2024)
by: Huang, Mincong, et al.
Published: (2024)
PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving
by: Bai, Xu, et al.
Published: (2026)
by: Bai, Xu, et al.
Published: (2026)
Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
by: Rajbhandari, Samyam, et al.
Published: (2025)
by: Rajbhandari, Samyam, et al.
Published: (2025)
Vanishing Variance Problem in Fully Decentralized Neural-Network Systems
by: Tian, Yongding, et al.
Published: (2024)
by: Tian, Yongding, et al.
Published: (2024)
FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission
by: Zhang, Zeling, et al.
Published: (2024)
by: Zhang, Zeling, et al.
Published: (2024)
Context Parallelism for Scalable Million-Token Inference
by: Yang, Amy, et al.
Published: (2024)
by: Yang, Amy, et al.
Published: (2024)
DHO$_2$: Accelerating Distributed Hybrid Order Optimization via Model Parallelism and ADMM
by: Gu, Shunxian, et al.
Published: (2025)
by: Gu, Shunxian, et al.
Published: (2025)
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
by: Wang, Yujie, et al.
Published: (2024)
by: Wang, Yujie, et al.
Published: (2024)
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
by: Polisetty, Sandeep, et al.
Published: (2023)
by: Polisetty, Sandeep, et al.
Published: (2023)
On Optimizing the Communication of Model Parallelism
by: Zhuang, Yonghao, et al.
Published: (2022)
by: Zhuang, Yonghao, et al.
Published: (2022)
Armada: Memory-Efficient Distributed Training of Large-Scale Graph Neural Networks
by: Waleffe, Roger, et al.
Published: (2025)
by: Waleffe, Roger, et al.
Published: (2025)
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
by: Gupta, Ahan, et al.
Published: (2026)
by: Gupta, Ahan, et al.
Published: (2026)
Edge-Parallel Graph Encoder Embedding
by: Lubonja, Ariel, et al.
Published: (2024)
by: Lubonja, Ariel, et al.
Published: (2024)
TASP: Topology-aware Sequence Parallelism
by: Wang, Yida, et al.
Published: (2025)
by: Wang, Yida, et al.
Published: (2025)
Hydraulis: Balancing Large Transformer Model Training via Co-designing Parallel Strategies and Data Assignment
by: Li, Haoyang, et al.
Published: (2024)
by: Li, Haoyang, et al.
Published: (2024)
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
by: Wan, Xinyi, et al.
Published: (2025)
by: Wan, Xinyi, et al.
Published: (2025)
Breaking the Memory Wall for Heterogeneous Federated Learning via Progressive Training
by: Wu, Yebo, et al.
Published: (2024)
by: Wu, Yebo, et al.
Published: (2024)
Similar Items
-
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
by: Zhao, Xuanlei, et al.
Published: (2024) -
DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism
by: Jiang, Chenyu, et al.
Published: (2025) -
ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference
by: Meng, Han, et al.
Published: (2026) -
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
by: Wu, Bingyang, et al.
Published: (2024) -
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
by: Wang, Yujie, et al.
Published: (2023)