Saved in:
| Main Authors: | Zhang, Zongpu, Dash, Pranab, Hu, Y. Charlie, Xu, Qiang, Li, Jian, Guan, Haibing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.02135 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RTP-LLM: High-Performance Alibaba LLM Inference Engine
by: Tan, Boyu, et al.
Published: (2026)
by: Tan, Boyu, et al.
Published: (2026)
Energy-Efficient Computation with DVFS using Deep Reinforcement Learning for Multi-Task Systems in Edge Computing
by: Li, Xinyi, et al.
Published: (2024)
by: Li, Xinyi, et al.
Published: (2024)
Dissecting CXL Memory Performance at Scale: Analysis, Modeling, and Optimization
by: Liu, Jinshu, et al.
Published: (2024)
by: Liu, Jinshu, et al.
Published: (2024)
AIOS: LLM Agent Operating System
by: Mei, Kai, et al.
Published: (2024)
by: Mei, Kai, et al.
Published: (2024)
LLM as a System Service on Mobile Devices
by: Yin, Wangsong, et al.
Published: (2024)
by: Yin, Wangsong, et al.
Published: (2024)
VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM
by: Jin, Lesheng, et al.
Published: (2025)
by: Jin, Lesheng, et al.
Published: (2025)
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference
by: Du, Hongchao, et al.
Published: (2025)
by: Du, Hongchao, et al.
Published: (2025)
SSV: Sparse Speculative Verification for Efficient LLM Inference
by: Wang, Zhibin, et al.
Published: (2026)
by: Wang, Zhibin, et al.
Published: (2026)
GoCkpt: Gradient-Assisted Multi-Step overlapped Checkpointing for Efficient LLM Training
by: Zhang, Keyao, et al.
Published: (2025)
by: Zhang, Keyao, et al.
Published: (2025)
Potential of WebAssembly for Embedded Systems
by: Wallentowitz, Stefan, et al.
Published: (2024)
by: Wallentowitz, Stefan, et al.
Published: (2024)
OBASE: Object-Based Address-Space Engineering to Improve Memory Tiering
by: Banakar, Vinay, et al.
Published: (2026)
by: Banakar, Vinay, et al.
Published: (2026)
Getting a Handle on Unmanaged Memory
by: Wanninger, Nick, et al.
Published: (2024)
by: Wanninger, Nick, et al.
Published: (2024)
Scaling Inter-procedural Dataflow Analysis on the Cloud
by: Sun, Zewen, et al.
Published: (2024)
by: Sun, Zewen, et al.
Published: (2024)
Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving
by: Qiu, Shi, et al.
Published: (2026)
by: Qiu, Shi, et al.
Published: (2026)
Quine: Realizing LLM Agents as Native POSIX Processes
by: Ke, Hao
Published: (2026)
by: Ke, Hao
Published: (2026)
Horizon-LM: A RAM-Centric Architecture for LLM Training
by: Yuan, Zhengqing, et al.
Published: (2026)
by: Yuan, Zhengqing, et al.
Published: (2026)
Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery
by: Rama, Balaji, et al.
Published: (2025)
by: Rama, Balaji, et al.
Published: (2025)
Tidying Up the Address Space
by: Banakar, Vinay, et al.
Published: (2025)
by: Banakar, Vinay, et al.
Published: (2025)
Flare: Anomaly Diagnostics for Divergent LLM Training in GPU Clusters of Thousand-Plus Scale
by: Cui, Weihao, et al.
Published: (2025)
by: Cui, Weihao, et al.
Published: (2025)
MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection
by: Huang, Zhengxiang, et al.
Published: (2025)
by: Huang, Zhengxiang, et al.
Published: (2025)
Scalable and Accurate Application-Level Crash-Consistency Testing via Representative Testing
by: Gu, Yile, et al.
Published: (2025)
by: Gu, Yile, et al.
Published: (2025)
Assessing FIFO and Round Robin Scheduling:Effects on Data Pipeline Performance and Energy Usage
by: Choudhury, Malobika Roy, et al.
Published: (2024)
by: Choudhury, Malobika Roy, et al.
Published: (2024)
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
by: Wu, Chenpeng, et al.
Published: (2025)
by: Wu, Chenpeng, et al.
Published: (2025)
Principled Performance Tunability in Operating System Kernels
by: Chen, Zhongjie, et al.
Published: (2025)
by: Chen, Zhongjie, et al.
Published: (2025)
WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and Portability
by: Has, Mislav, et al.
Published: (2025)
by: Has, Mislav, et al.
Published: (2025)
ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System
by: Kang, Hao, et al.
Published: (2026)
by: Kang, Hao, et al.
Published: (2026)
Semantic Scheduling for LLM Inference
by: Hua, Wenyue, et al.
Published: (2025)
by: Hua, Wenyue, et al.
Published: (2025)
Valve: Production Online-Offline Inference Colocation with Jointly-Bounded Preemption Latency and Rate
by: Liu, Fangyue, et al.
Published: (2026)
by: Liu, Fangyue, et al.
Published: (2026)
ASC-Hook: fast and transparent system call hook for Arm
by: Shen, Yang, et al.
Published: (2024)
by: Shen, Yang, et al.
Published: (2024)
Compiling Away the Overhead of Race Detection
by: Paznikov, Alexey, et al.
Published: (2025)
by: Paznikov, Alexey, et al.
Published: (2025)
Sockeye: a language for analyzing hardware documentation
by: Fiedler, Ben, et al.
Published: (2025)
by: Fiedler, Ben, et al.
Published: (2025)
Futureproof Static Memory Planning
by: Lamprakos, Christos, et al.
Published: (2025)
by: Lamprakos, Christos, et al.
Published: (2025)
vNV-Heap: An Ownership-Based Virtually Non-Volatile Heap for Embedded Systems
by: Gerber, Markus Elias, et al.
Published: (2025)
by: Gerber, Markus Elias, et al.
Published: (2025)
Safe and usable kernel extensions with Rex
by: Jia, Jinghao, et al.
Published: (2025)
by: Jia, Jinghao, et al.
Published: (2025)
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
by: Zheng, Yusheng, et al.
Published: (2025)
by: Zheng, Yusheng, et al.
Published: (2025)
Clove: Object-Level CXL Memory Management in Managed Runtimes
by: Son, Sam, et al.
Published: (2026)
by: Son, Sam, et al.
Published: (2026)
Decoupling Vector Data and Index Storage for Space Efficiency
by: Ren, Yuanming, et al.
Published: (2026)
by: Ren, Yuanming, et al.
Published: (2026)
Towards High-Goodput LLM Serving with Prefill-decode Multiplexing
by: Chen, Yukang, et al.
Published: (2025)
by: Chen, Yukang, et al.
Published: (2025)
Revitalising the Single Batch Environment: A 'Quest' to Achieve Fairness and Efficiency
by: Manna, Supriya, et al.
Published: (2023)
by: Manna, Supriya, et al.
Published: (2023)
Vmem: A Lightweight Hot-Upgradable Memory Management for In-production Cloud Environment
by: Zheng, Hao, et al.
Published: (2025)
by: Zheng, Hao, et al.
Published: (2025)
Similar Items
-
RTP-LLM: High-Performance Alibaba LLM Inference Engine
by: Tan, Boyu, et al.
Published: (2026) -
Energy-Efficient Computation with DVFS using Deep Reinforcement Learning for Multi-Task Systems in Edge Computing
by: Li, Xinyi, et al.
Published: (2024) -
Dissecting CXL Memory Performance at Scale: Analysis, Modeling, and Optimization
by: Liu, Jinshu, et al.
Published: (2024) -
AIOS: LLM Agent Operating System
by: Mei, Kai, et al.
Published: (2024) -
LLM as a System Service on Mobile Devices
by: Yin, Wangsong, et al.
Published: (2024)