:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Minghua, Zhang, Lingzhe, Liu, Yuan, Zhou, Xiao, Liu, Aiwei
Format:	Preprint
Published:	2026
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2605.30851
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
by: Zhang, Lingzhe, et al.
Published: (2025)

Accelerating Diffusion LLMs via Adaptive Parallel Decoding
by: Israel, Daniel, et al.
Published: (2025)

HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
by: Huang, Haochen, et al.
Published: (2025)

Spatiotemporal Analysis of Parallelized Computing at the Extreme Edge
by: Nabil, Yasser, et al.
Published: (2025)

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC
by: Shen, Siyuan, et al.
Published: (2025)

On Orchestrating Parallel Broadcasts for Distributed Ledgers
by: Sheng, Peiyao, et al.
Published: (2024)

GigaAPI for GPU Parallelization
by: Suvarna, M., et al.
Published: (2025)

Robust Recursive Query Parallelism in Graph Database Management Systems
by: Chakraborty, Anurag, et al.
Published: (2025)

Fault-Tolerant Hybrid-Parallel Training at Scale with Reliable and Efficient In-memory Checkpointing
by: Wang, Yuxin, et al.
Published: (2023)

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)

PEVLM: Parallel Encoding for Vision-Language Models
by: Kang, Letian, et al.
Published: (2025)

Automated Programmatic Performance Analysis of Parallel Programs
by: Cankur, Onur, et al.
Published: (2024)

Parallelizing a modern GPU simulator
by: Huerta, Rodrigo, et al.
Published: (2025)

Optimal Parallel Scheduling under Concave Speedup Functions
by: Li, Chengzhang, et al.
Published: (2025)

Recorder: Comprehensive Parallel I/O Tracing and Analysis
by: Wang, Chen, et al.
Published: (2025)

GPU-Accelerated Parallel Selected Inversion for Structured Matrices Using sTiles
by: Fattah, Esmail Abdul, et al.
Published: (2025)

Automated Calibration of Parallel and Distributed Computing Simulators: A Case Study
by: McDonald, Jesse, et al.
Published: (2024)

ParaLog: Consistent Host-side Logging for Parallel Checkpoints
by: Chien, Steven W. D., et al.
Published: (2024)

Cache Blocking of Distributed-Memory Parallel Matrix Power Kernels
by: Lacey, Dane C., et al.
Published: (2024)

Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask
by: Abraham, Ashley N., et al.
Published: (2026)

Parallel Implementations Assessment of a Spatial-Spectral Classifier for Hyperspectral Clinical Applications
by: Lazcano, Raquel, et al.
Published: (2024)

Comparing Parallel Functional Array Languages: Programming and Performance
by: van Balen, David, et al.
Published: (2025)

ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration
by: Lin, Wei-Fen, et al.
Published: (2026)

Kino-PAX: Highly Parallel Kinodynamic Sampling-based Planner
by: Perrault, Nicolas, et al.
Published: (2024)

Parallel $k$d-tree with Batch Updates
by: Men, Ziyang, et al.
Published: (2024)

Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P
by: Dutt, Anurag, et al.
Published: (2025)

Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher
by: Esfahani, Mohsen Koohi, et al.
Published: (2024)

Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation
by: Wang, Tuowei, et al.
Published: (2024)

Can Large Language Models Predict Parallel Code Performance?
by: Bolet, Gregory, et al.
Published: (2025)

An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference
by: Yao, Feiyu, et al.
Published: (2026)

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
by: Yang, Liu, et al.
Published: (2026)

Using UML State Diagrams for Modelling the Performance of Parallel Programs
by: Jorge Ortega Arjona
Published: (2008)

CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
by: Wheatman, Brian, et al.
Published: (2023)

PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference
by: Fang, Jiarui, et al.
Published: (2024)

Binary Bleed: Fast Distributed and Parallel Method for Automatic Model Selection
by: Barron, Ryan, et al.
Published: (2024)

DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System
by: Rashid, Md Hasanur, et al.
Published: (2026)

CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
by: Rashid, Md Hasanur, et al.
Published: (2026)

Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows
by: Montserrat, Daniel Mas, et al.
Published: (2025)

Massimult: A Novel Parallel CPU Architecture Based on Combinator Reduction
by: Nicklisch-Franken, Jurgen, et al.
Published: (2024)

Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey
by: Ather, Hammad, et al.
Published: (2024)