:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bekman, Stas, Rajbhandari, Samyam, Wyatt, Michael, Rasley, Jeff, Ruwase, Tunji, Yao, Zhewei, Qiao, Aurick, He, Yuxiong
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.13996
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
by: Hidayetoglu, Mert, et al.
Published: (2025)

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
by: Qiao, Aurick, et al.
Published: (2024)

Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
by: Rajbhandari, Samyam, et al.
Published: (2025)

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)

STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
by: Lee, Jaeseong, et al.
Published: (2024)

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
by: Holmes, Connor, et al.
Published: (2024)

OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs
by: Lee, Jaeseong, et al.
Published: (2025)

Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis
by: Lian, Xinyu, et al.
Published: (2024)

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
by: Gupta, Ahan, et al.
Published: (2026)

Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment
by: Renc, Pawel, et al.
Published: (2025)

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
by: Hu, Lanxiang, et al.
Published: (2025)

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)

FastPersist: Accelerating Model Checkpointing in Deep Learning
by: Wang, Guanhua, et al.
Published: (2024)

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
by: Li, Conglong, et al.
Published: (2022)

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data
by: Cook, John, et al.
Published: (2026)

ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback
by: Zhai, Bohan, et al.
Published: (2025)

MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility
by: He, Yexiao, et al.
Published: (2025)

LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
by: Gu, Diandian, et al.
Published: (2024)

Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training
by: Luo, Cheng, et al.
Published: (2024)

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
by: Yao, Jinghan, et al.
Published: (2024)

BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M Tokens
by: Sun, Ao, et al.
Published: (2025)

Learning to Hint for Reinforcement Learning
by: Xia, Yu, et al.
Published: (2026)

R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL
by: Han, Hojae, et al.
Published: (2026)

Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
by: Chen, Fahao, et al.
Published: (2024)

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
by: Zhang, Zhenyu, et al.
Published: (2024)

Learning to Self-Evolve
by: Chen, Xiaoyin, et al.
Published: (2026)

StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs
by: Luo, Qijun, et al.
Published: (2025)

SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
by: Lian, Xinyu, et al.
Published: (2025)

Inference Scaling for Bridging Retrieval and Augmented Generation
by: Lee, Youngwon, et al.
Published: (2024)

CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation
by: Lee, Youngwon, et al.
Published: (2024)

Context Parallelism for Scalable Million-Token Inference
by: Yang, Amy, et al.
Published: (2024)

Pretext Training Algorithms for Event Sequence Data
by: Wang, Yimu, et al.
Published: (2024)

Training-free LLM-generated Text Detection by Mining Token Probability Sequences
by: Xu, Yihuai, et al.
Published: (2024)

Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL
by: Yao, Zhewei, et al.
Published: (2025)

SPPO:Efficient Long-sequence LLM Training via Adaptive Sequence Pipeline Parallel Offloading
by: Chen, Qiaoling, et al.
Published: (2025)

HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism
by: Zhang, Geng, et al.
Published: (2025)

Multi-word Tokenization for Sequence Compression
by: Gee, Leonidas, et al.
Published: (2024)

ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering
by: Wu, Ruofan, et al.
Published: (2025)

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
by: Shu, Fan, et al.
Published: (2026)

Out of the Memory Barrier: A Highly Memory Efficient Training System for LLMs with Million-Token Contexts
by: Li, Wenhao, et al.
Published: (2026)