Saved in:
| Main Authors: | Yang, Mengtian, Zhang, Zhekun, Wu, Mingheng, Yan, Jianwen, Sun, Hanshi, Chang, Li-wen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17164 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training
by: Zheng, Size, et al.
Published: (2026)
by: Zheng, Size, et al.
Published: (2026)
Evaluating SYCL as a Unified Programming Model for Heterogeneous Systems
by: Marowka, Ami
Published: (2026)
by: Marowka, Ami
Published: (2026)
Scaling Deep Learning Training with MPMD Pipeline Parallelism
by: Xhebraj, Anxhelo, et al.
Published: (2024)
by: Xhebraj, Anxhelo, et al.
Published: (2024)
Morphling: Fast, Fused, and Flexible GNN Training at Scale
by: Anubhab, et al.
Published: (2025)
by: Anubhab, et al.
Published: (2025)
LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems
by: Li, Yufei, et al.
Published: (2025)
by: Li, Yufei, et al.
Published: (2025)
Simplicity Scales
by: Sampson, Andrew, et al.
Published: (2026)
by: Sampson, Andrew, et al.
Published: (2026)
ScanWeaver: Compiler-Driven Parallelization of Affine Recurrences via Associative Scan Lowering
by: Wu, Qiying, et al.
Published: (2026)
by: Wu, Qiying, et al.
Published: (2026)
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD
by: Li, Youjie, et al.
Published: (2025)
by: Li, Youjie, et al.
Published: (2025)
MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training
by: Zhao, Lu, et al.
Published: (2025)
by: Zhao, Lu, et al.
Published: (2025)
Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel
by: Jin, Hongyi, et al.
Published: (2026)
by: Jin, Hongyi, et al.
Published: (2026)
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
by: Feng, Weiqi, et al.
Published: (2024)
by: Feng, Weiqi, et al.
Published: (2024)
Multi-Relational Algebra for Multi-Granular Data Analytics
by: Wu, Xi, et al.
Published: (2023)
by: Wu, Xi, et al.
Published: (2023)
Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P
by: Dutt, Anurag, et al.
Published: (2025)
by: Dutt, Anurag, et al.
Published: (2025)
Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures
by: Singh, Ajay, et al.
Published: (2025)
by: Singh, Ajay, et al.
Published: (2025)
Timetide: A programming model for logically synchronous distributed systems
by: Kenwright, Logan, et al.
Published: (2025)
by: Kenwright, Logan, et al.
Published: (2025)
Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms
by: Ivanenko, Serhii, et al.
Published: (2022)
by: Ivanenko, Serhii, et al.
Published: (2022)
MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
Verifying Properties of Index Arrays in a Purely-Functional Data-Parallel Language
by: Hinnerskov, Nikolaj Hey, et al.
Published: (2025)
by: Hinnerskov, Nikolaj Hey, et al.
Published: (2025)
An MLIR pipeline for offloading Fortran to FPGAs via OpenMP
by: Rodriguez-Canal, Gabriel, et al.
Published: (2025)
by: Rodriguez-Canal, Gabriel, et al.
Published: (2025)
Sal: Multi-modal Verification of Replicated Data Types
by: Ramesh, Pranav, et al.
Published: (2026)
by: Ramesh, Pranav, et al.
Published: (2026)
Assessing Opportunities of SYCL for Biological Sequence Alignment on GPU-based Systems
by: Costanzo, Manuel, et al.
Published: (2022)
by: Costanzo, Manuel, et al.
Published: (2022)
Flo: a Semantic Foundation for Progressive Stream Processing
by: Laddad, Shadaj, et al.
Published: (2024)
by: Laddad, Shadaj, et al.
Published: (2024)
Choreographies as Macros
by: Bohosian, Alexander, et al.
Published: (2025)
by: Bohosian, Alexander, et al.
Published: (2025)
OMP4Py: a pure Python implementation of OpenMP
by: Piñeiro, César, et al.
Published: (2024)
by: Piñeiro, César, et al.
Published: (2024)
Suki: Choreographed Distributed Dataflow in Rust
by: Laddad, Shadaj, et al.
Published: (2024)
by: Laddad, Shadaj, et al.
Published: (2024)
Towards a Function-as-a-Service Choreographic Programming Language: Examples and Applications
by: De Palma, Giuseppe, et al.
Published: (2024)
by: De Palma, Giuseppe, et al.
Published: (2024)
On the Duality of Task and Actor Programming Models
by: Yadav, Rohan, et al.
Published: (2025)
by: Yadav, Rohan, et al.
Published: (2025)
GuStL - An Experimental Guarded States Language
by: Schirmer, Oskar
Published: (2016)
by: Schirmer, Oskar
Published: (2016)
Detrimental task execution patterns in mainstream OpenMP runtimes
by: Tuft, Adam S., et al.
Published: (2024)
by: Tuft, Adam S., et al.
Published: (2024)
We Know I Know You Know; Choreographic Programming With Multicast and Multiply Located Values
by: Bates, Mako, et al.
Published: (2024)
by: Bates, Mako, et al.
Published: (2024)
Extending Contract Verification for Parallel Programming Models to Fortran
by: Oraji, Yussur Mustafa, et al.
Published: (2026)
by: Oraji, Yussur Mustafa, et al.
Published: (2026)
Mat2Boundary: Treating User-Defined Boundary Condition as SpMV for Distributed PDE Solvers on Block-Structured Grids
by: Cai, Yanzheng, et al.
Published: (2026)
by: Cai, Yanzheng, et al.
Published: (2026)
Streamlining Cloud-Native Application Development and Deployment with Robust Encapsulation
by: Lertpongrujikorn, Pawissanutt, et al.
Published: (2024)
by: Lertpongrujikorn, Pawissanutt, et al.
Published: (2024)
PRDTs: Composable Knowledge-Based Consensus Protocols with Replicated Data Types
by: Haas, Julian, et al.
Published: (2025)
by: Haas, Julian, et al.
Published: (2025)
Distributed Locking as a Data Type
by: Haas, Julian, et al.
Published: (2024)
by: Haas, Julian, et al.
Published: (2024)
Fully integrating the Flang Fortran compiler with standard MLIR
by: Brown, Nick
Published: (2024)
by: Brown, Nick
Published: (2024)
Introducing Support for Move Operations in Melda CRDT
by: Brocco, Amos
Published: (2025)
by: Brocco, Amos
Published: (2025)
Actor Capabilities for Message Ordering (Extended Version)
by: Gordon, Colin S.
Published: (2025)
by: Gordon, Colin S.
Published: (2025)
Failure Transparency in Stateful Dataflow Systems (Technical Report)
by: Veresov, Aleksey, et al.
Published: (2024)
by: Veresov, Aleksey, et al.
Published: (2024)
Mapple: A Domain-Specific Language for Mapping Distributed Programs
by: Wei, Anjiang, et al.
Published: (2025)
by: Wei, Anjiang, et al.
Published: (2025)
Similar Items
-
UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training
by: Zheng, Size, et al.
Published: (2026) -
Evaluating SYCL as a Unified Programming Model for Heterogeneous Systems
by: Marowka, Ami
Published: (2026) -
Scaling Deep Learning Training with MPMD Pipeline Parallelism
by: Xhebraj, Anxhelo, et al.
Published: (2024) -
Morphling: Fast, Fused, and Flexible GNN Training at Scale
by: Anubhab, et al.
Published: (2025) -
LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems
by: Li, Yufei, et al.
Published: (2025)