Saved in:
| Main Authors: | Cai, Ye, Yang, Zonglin, Ni, Liwei, Liu, Junfeng, Xie, Biwei, Li, Xingquan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.13617 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing ASIC Technology Mapping via Parallel Supergate Computing
by: Cai, Ye, et al.
Published: (2024)
by: Cai, Ye, et al.
Published: (2024)
Efficient Parallel Execution of Blockchain Transactions Leveraging Conflict Specifications
by: Anjana, Parwat Singh, et al.
Published: (2025)
by: Anjana, Parwat Singh, et al.
Published: (2025)
HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs
by: Li, Yanliang, et al.
Published: (2025)
by: Li, Yanliang, et al.
Published: (2025)
FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters
by: Lin, Yanying, et al.
Published: (2025)
by: Lin, Yanying, et al.
Published: (2025)
Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism
by: Li, Cong, et al.
Published: (2025)
by: Li, Cong, et al.
Published: (2025)
Cold-Start Anti-Patterns and Refactorings in Serverless Systems: An Empirical Study
by: Tariq, Syed Salauddin Mohammad, et al.
Published: (2025)
by: Tariq, Syed Salauddin Mohammad, et al.
Published: (2025)
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
by: Li, Haoyang, et al.
Published: (2024)
by: Li, Haoyang, et al.
Published: (2024)
Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework
by: Huang, Ruixiang, et al.
Published: (2026)
by: Huang, Ruixiang, et al.
Published: (2026)
Linear Complexity $\mathcal{H}^2$ Direct Solver for Fine-Grained Parallel Architectures
by: Boukaram, Wajih, et al.
Published: (2025)
by: Boukaram, Wajih, et al.
Published: (2025)
DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism
by: Jiang, Chenyu, et al.
Published: (2025)
by: Jiang, Chenyu, et al.
Published: (2025)
Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism
by: Li, Shengwei, et al.
Published: (2023)
by: Li, Shengwei, et al.
Published: (2023)
NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding
by: Chen, Jiefei, et al.
Published: (2026)
by: Chen, Jiefei, et al.
Published: (2026)
Advances in Semantic Patching for HPC-oriented Refactorings with Coccinelle
by: Martone, Michele, et al.
Published: (2025)
by: Martone, Michele, et al.
Published: (2025)
Balancing Pipeline Parallelism with Vocabulary Parallelism
by: Yeung, Man Tsung, et al.
Published: (2024)
by: Yeung, Man Tsung, et al.
Published: (2024)
Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP
by: Zhao, Yilong, et al.
Published: (2026)
by: Zhao, Yilong, et al.
Published: (2026)
Optimizing View Change for Byzantine Fault Tolerance in Parallel Consensus
by: Xie, Yifei, et al.
Published: (2026)
by: Xie, Yifei, et al.
Published: (2026)
S-HPLB: Efficient LLM Attention Serving via Sparsity-Aware Head Parallelism Load Balance
by: Liu, Di, et al.
Published: (2026)
by: Liu, Di, et al.
Published: (2026)
Communication-Computation Pipeline Parallel Split Learning over Wireless Edge Networks
by: Liu, Chenyu, et al.
Published: (2025)
by: Liu, Chenyu, et al.
Published: (2025)
HYDRA: Breaking the Global Ordering Barrier in Multi-BFT Consensus
by: Lyu, Hanzheng, et al.
Published: (2025)
by: Lyu, Hanzheng, et al.
Published: (2025)
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices
by: Shen, Tao, et al.
Published: (2025)
by: Shen, Tao, et al.
Published: (2025)
Maximizing Blockchain Performance: Mitigating Conflicting Transactions through Parallelism and Dependency Management
by: Bappy, Faisal Haque, et al.
Published: (2024)
by: Bappy, Faisal Haque, et al.
Published: (2024)
ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism
by: Ma, Tenghui, et al.
Published: (2026)
by: Ma, Tenghui, et al.
Published: (2026)
Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
by: Wang, Zhigang, et al.
Published: (2024)
by: Wang, Zhigang, et al.
Published: (2024)
Synergistic Tensor and Pipeline Parallelism
by: Qi, Mengshi, et al.
Published: (2025)
by: Qi, Mengshi, et al.
Published: (2025)
ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology
by: Tang, Ding, et al.
Published: (2024)
by: Tang, Ding, et al.
Published: (2024)
Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism
by: Wei, Jinhui, et al.
Published: (2025)
by: Wei, Jinhui, et al.
Published: (2025)
Committee Configuration Optimization for Parallel Byzantine Consensus in a Trusted Execution Environment
by: Xie, Yifei, et al.
Published: (2026)
by: Xie, Yifei, et al.
Published: (2026)
Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism
by: Lin, Xinyuan, et al.
Published: (2025)
by: Lin, Xinyuan, et al.
Published: (2025)
HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)
by: Lin, Haoran, et al.
Published: (2025)
Parallel Collaborative ADMM Privacy Computing and Adaptive GPU Acceleration for Distributed Edge Networks
by: Xia, Mengchun, et al.
Published: (2026)
by: Xia, Mengchun, et al.
Published: (2026)
SPPO:Efficient Long-sequence LLM Training via Adaptive Sequence Pipeline Parallel Offloading
by: Chen, Qiaoling, et al.
Published: (2025)
by: Chen, Qiaoling, et al.
Published: (2025)
MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference
by: Tang, Xinru, et al.
Published: (2025)
by: Tang, Xinru, et al.
Published: (2025)
Pending Conflicts Make Progress Impossible
by: Kuznetsov, Petr, et al.
Published: (2026)
by: Kuznetsov, Petr, et al.
Published: (2026)
Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning
by: Wang, Liwei, et al.
Published: (2024)
by: Wang, Liwei, et al.
Published: (2024)
A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training
by: Jiang, Lijuan, et al.
Published: (2025)
by: Jiang, Lijuan, et al.
Published: (2025)
Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
by: Sun, Xun, et al.
Published: (2026)
by: Sun, Xun, et al.
Published: (2026)
Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism
by: Qing, Yuhao, et al.
Published: (2025)
by: Qing, Yuhao, et al.
Published: (2025)
High-Performance N-Queens Solver on GPU: Iterative DFS with Zero Bank Conflicts
by: Yao, Guangchao, et al.
Published: (2025)
by: Yao, Guangchao, et al.
Published: (2025)
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
by: Li, Zixuan, et al.
Published: (2024)
by: Li, Zixuan, et al.
Published: (2024)
Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks
by: Ma, Mulei, et al.
Published: (2025)
by: Ma, Mulei, et al.
Published: (2025)
Similar Items
-
Enhancing ASIC Technology Mapping via Parallel Supergate Computing
by: Cai, Ye, et al.
Published: (2024) -
Efficient Parallel Execution of Blockchain Transactions Leveraging Conflict Specifications
by: Anjana, Parwat Singh, et al.
Published: (2025) -
HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs
by: Li, Yanliang, et al.
Published: (2025) -
FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters
by: Lin, Yanying, et al.
Published: (2025) -
Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism
by: Li, Cong, et al.
Published: (2025)