:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Youpeng, LV, Jinpeng, Wu, Di, Wang, Jun, Gooley, Christopher
Format:	Preprint
Published:	2025
Subjects:	Performance Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.19645
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)

ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
by: Zhao, Youpeng, et al.
Published: (2024)

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving
by: Jayakody, Shakya, et al.
Published: (2026)

The Race to Efficiency: A New Perspective on AI Scaling Laws
by: Lu, Chien-Ping
Published: (2025)

Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs
by: Liu, Yi
Published: (2026)

AI-driven Java Performance Testing: Balancing Result Quality with Testing Time
by: Traini, Luca, et al.
Published: (2024)

Generalizing Scaling Laws for Dense and Sparse Large Language Models
by: Hossain, Md Arafat, et al.
Published: (2025)

Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)

Adaptive Orchestration for Large-Scale Inference on Heterogeneous Accelerator Systems Balancing Cost, Performance, and Resilience
by: Biran, Yahav, et al.
Published: (2025)

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
by: Zhao, Qidong, et al.
Published: (2024)

Offloading and Quality Control for AI Generated Content Services in 6G Mobile Edge Computing Networks
by: Wang, Yitong, et al.
Published: (2023)

Comment on paper: Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems
by: Min, Yimeng
Published: (2024)

Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations
by: Rani, Pooja, et al.
Published: (2025)

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search
by: Ellis-Mohr, Austin R., et al.
Published: (2025)

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics
by: Liu, Jiashuo, et al.
Published: (2025)

Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments
by: Zhu, Yuhan, et al.
Published: (2024)

Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI
by: Atinafu, Yonas, et al.
Published: (2026)

Quantifying the Generalization Gap in Seizure Detection: A Large-Scale Empirical Benchmark via the SzCORE Challenge
by: Dan, Jonathan, et al.
Published: (2025)

TurboSpec: Closed-loop Speculation Control System for Optimizing LLM Serving Goodput
by: Liu, Xiaoxuan, et al.
Published: (2024)

Rapid Augmentations for Time Series (RATS): A High-Performance Library for Time Series Augmentation
by: Skaf, Wadie, et al.
Published: (2026)

A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
by: Singh, Siddharth, et al.
Published: (2023)

DDSA: Dual-Domain Strategic Attack for Spatial-Temporal Efficiency in Adversarial Robustness Testing
by: Hu, Jinwei, et al.
Published: (2026)

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
by: Liao, Gang, et al.
Published: (2025)

Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems
by: Zhao, Yushang, et al.
Published: (2025)

Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles
by: Gallici, Matteo, et al.
Published: (2025)

Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
by: Tithi, Jesmin Jahan, et al.
Published: (2025)

STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning
by: Huang, Zixiao, et al.
Published: (2025)

Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System
by: Fang, Yunhua, et al.
Published: (2025)

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
by: Qiao, Liang, et al.
Published: (2025)

XTC, A Research Platform for Optimizing AI Workload Operators
by: Hugo, Pompougnac, et al.
Published: (2025)

On the Compression of Language Models for Code: An Empirical Study on CodeBERT
by: d'Aloisio, Giordano, et al.
Published: (2024)

LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
by: Cai, Yanan, et al.
Published: (2025)

Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
by: Kermani, Arshia, et al.
Published: (2025)

Energy Concerns with HPC Systems and Applications
by: Nana, Roblex, et al.
Published: (2023)

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs
by: Krupp, Lars, et al.
Published: (2026)

Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices
by: Yan, Xiao, et al.
Published: (2025)

Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
by: Huang, Zixiao, et al.
Published: (2025)

Revolutionizing System Reliability: The Role of AI in Predictive Maintenance Strategies
by: Bidollahkhani, Michael, et al.
Published: (2024)

Performance Modeling of Data Storage Systems using Generative Models
by: Al-Maeeni, Abdalaziz Rashid, et al.
Published: (2023)

Tanz der Dinge/Things that dance
Published: (2021)