:: Library Catalog

表紙画像

保存先:

書誌詳細
主要な著者:	Sun, Minqiu, Huang, Xin, Guo, Luanzheng, Tallent, Nathan R., Sato, Kento, Dai, Dong
フォーマット:	Preprint
出版事項:	2026
主題:	Distributed, Parallel, and Cluster Computing
オンライン･アクセス:	https://arxiv.org/abs/2602.22158
タグ:	タグ追加タグなし, このレコードへの初めてのタグを付けませんか!

類似資料

Scrutinizing Variables for Checkpoint Using Automatic Differentiation
著者:: Huang, Xin, 等
出版事項: (2026)

On The Reproducibility Limitations of RAG Systems
著者:: Wang, Baiqiang, 等
出版事項: (2025)

PowerTrip: Exploiting Federated Heterogeneous Datacenter Power for Distributed ML Training
著者:: Mehboob, Talha, 等
出版事項: (2025)

QoSFlow: Ensuring Service Quality of Distributed Workflows Using Interpretable Sensitivity Models
著者:: Rashid, Md Hasanur, 等
出版事項: (2026)

Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi-threaded Programs
著者:: Fu, Xiang, 等
出版事項: (2026)

CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
著者:: Rashid, Md Hasanur, 等
出版事項: (2026)

ParaLog: Consistent Host-side Logging for Parallel Checkpoints
著者:: Chien, Steven W. D., 等
出版事項: (2024)

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs
著者:: Sarkar, Aishwarya, 等
出版事項: (2024)

Memory-Efficient Federated Fine-Tuning of Large Language Models via Layer Pruning
著者:: Wu, Yebo, 等
出版事項: (2025)

Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework
著者:: Zhang, Boyuan, 等
出版事項: (2024)

NOMAD: Generating Embeddings for Massive Distributed Graphs
著者:: Sarkar, Aishwarya, 等
出版事項: (2026)

Understanding Power Consumption Metric on Heterogeneous Memory Systems
著者:: Proaño, Andrès Rubio, 等
出版事項: (2024)

Improving SpGEMM Performance Through Matrix Reordering and Cluster-wise Computation
著者:: Islam, Abdullah Al Raqibul, 等
出版事項: (2025)

TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training
著者:: Han, Shujie, 等
出版事項: (2026)

Efficient LLM Inference with Activation Checkpointing and Hybrid Caching
著者:: Lee, Sanghyeon, 等
出版事項: (2025)

FedQuad: Adaptive Layer-wise LoRA Deployment and Activation Quantization for Federated Fine-Tuning
著者:: Li, Rukuo, 等
出版事項: (2025)

DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models
著者:: Maurya, Avinash, 等
出版事項: (2024)

InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
著者:: Chen, Qiaoling, 等
出版事項: (2024)

Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis
著者:: Lian, Xinyu, 等
出版事項: (2024)

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
著者:: Duan, Jiangfei, 等
出版事項: (2024)

SLO-Aware Scheduling for Large Language Model Inferences
著者:: Huang, Jinqi, 等
出版事項: (2025)

DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
著者:: Tang, Zhenheng, 等
出版事項: (2025)

Fault-Tolerant Hybrid-Parallel Training at Scale with Reliable and Efficient In-memory Checkpointing
著者:: Wang, Yuxin, 等
出版事項: (2023)

Asynchronous Checkpoint for Eventually Consistent Databases
著者:: Ravishankar, Raaghav, 等
出版事項: (2025)

Checkmate: Zero-Overhead Model Checkpointing via Network Gradient Replication
著者:: Bhardwaj, Ankit, 等
出版事項: (2025)

Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management
著者:: Qianli, Liu, 等
出版事項: (2025)

Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
著者:: Sun, Ao, 等
出版事項: (2024)

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
著者:: Stoyanov, Radostin, 等
出版事項: (2025)

Optimal Checkpoint Interval with Availability as an Objective Function
著者:: Saxena, Nirmal Raj, 等
出版事項: (2024)

Checkpoint and Restart: An Energy Consumption Characterization in Clusters
著者:: Moran, Marina, 等
出版事項: (2024)

SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference
著者:: Zhao, Yihao, 等
出版事項: (2025)

Cascadia: An Efficient Cascade Serving System for Large Language Models
著者:: Jiang, Youhe, 等
出版事項: (2025)

Sparse Checkpointing for Fast and Reliable MoE Training
著者:: Gandhi, Swapnil, 等
出版事項: (2024)

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models
著者:: Zhang, Zili, 等
出版事項: (2024)

FedAPTA: Federated Multi-task Learning for Heterogeneous Devices with Adaptive Layer-wise Pruning and Task-aware Aggregation
著者:: Yu, Zhen, 等
出版事項: (2025)

Pier: Efficient Large Language Model pretraining with Relaxed Global Communication
著者:: Fan, Shuyuan, 等
出版事項: (2025)

CRIU -- Checkpoint Restore in Userspace for computational simulations and scientific applications
著者:: Andrijauskas, Fabio, 等
出版事項: (2024)

Understanding LLM Checkpoint/Restore I/O Strategies and Patterns
著者:: Gossman, Mikaila J., 等
出版事項: (2025)

FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving
著者:: Gao, Shouwei, 等
出版事項: (2026)

MoLink: Distributed and Efficient Serving Framework for Large Models
著者:: Jin, Lewei, 等
出版事項: (2025)