Saved in:
| Main Authors: | Kim, Yoochan, Kim, Kihyun, Cho, Yonghyeon, Kim, Jinwoo, Khan, Awais, Kang, Ki-Dong, An, Baik-Song, Cha, Myung-Hoon, Kim, Hong-Yeon, Kim, Youngjae |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.05861 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading
by: Kim, Kihyun, et al.
Published: (2025)
by: Kim, Kihyun, et al.
Published: (2025)
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs
by: Lee, Hyungwoo, et al.
Published: (2025)
by: Lee, Hyungwoo, et al.
Published: (2025)
SpotVista: Availability-Aware Recommendation System for Reliable and Cost-Efficient Multi-Node Spot Instances
by: Kim, Taeyoon, et al.
Published: (2026)
by: Kim, Taeyoon, et al.
Published: (2026)
KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances
by: Kim, Taeyoon, et al.
Published: (2026)
by: Kim, Taeyoon, et al.
Published: (2026)
FCDP: Fully Cached Data Parallel for Communication-Avoiding Large-Scale Training
by: Park, Gyeongseo, et al.
Published: (2026)
by: Park, Gyeongseo, et al.
Published: (2026)
AFLL: Real-time Load Stabilization for MMO Game Servers Based on Circular Causality Learning
by: Kang, Shinsuk, et al.
Published: (2026)
by: Kang, Shinsuk, et al.
Published: (2026)
Ding-Dong Ditch: Peeking Into Spot Instance Availability
by: Kim, Kyumin, et al.
Published: (2026)
by: Kim, Kyumin, et al.
Published: (2026)
FedCostAware: Enabling Cost-Aware Federated Learning on the Cloud
by: Sinha, Aditya, et al.
Published: (2025)
by: Sinha, Aditya, et al.
Published: (2025)
Iterative Thresholding and Projection Algorithms and Model-Based Deep Neural Networks for Sparse LQR Control Design
by: Cho, Myung
Published: (2022)
by: Cho, Myung
Published: (2022)
Proof of Cloud: Data Center Execution Assurance for Confidential VMs
by: Rezabek, Filip, et al.
Published: (2025)
by: Rezabek, Filip, et al.
Published: (2025)
GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs on Large Clusters
by: Song, Jaeyong, et al.
Published: (2023)
by: Song, Jaeyong, et al.
Published: (2023)
Convergence Analysis of Federated Learning Methods Using Backward Error Analysis
by: Lim, Jinwoo, et al.
Published: (2025)
by: Lim, Jinwoo, et al.
Published: (2025)
The SAP Cloud Infrastructure Dataset: A Reality Check of Scheduling and Placement of VMs in Cloud Computing
by: Uhlig, Arno, et al.
Published: (2025)
by: Uhlig, Arno, et al.
Published: (2025)
Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction
by: Tofigh, Mani, et al.
Published: (2025)
by: Tofigh, Mani, et al.
Published: (2025)
SpotKube: Cost-Optimal Microservices Deployment with Cluster Autoscaling and Spot Pricing
by: Edirisinghe, Dasith, et al.
Published: (2024)
by: Edirisinghe, Dasith, et al.
Published: (2024)
TensorSocket: Shared Data Loading for Deep Learning Training
by: Robroek, Ties, et al.
Published: (2024)
by: Robroek, Ties, et al.
Published: (2024)
COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators
by: Park, Jihoon, et al.
Published: (2025)
by: Park, Jihoon, et al.
Published: (2025)
Learning Where It Matters: Geometric Anchoring for Robust Preference Alignment
by: Cho, Youngjae, et al.
Published: (2026)
by: Cho, Youngjae, et al.
Published: (2026)
OMEGA: A Low-Latency GNN Serving System for Large Graphs
by: Kim, Geon-Woo, et al.
Published: (2025)
by: Kim, Geon-Woo, et al.
Published: (2025)
Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation
by: Kim, Joon Ha, et al.
Published: (2026)
by: Kim, Joon Ha, et al.
Published: (2026)
HE2C: A Holistic Approach for Allocating Latency-Sensitive AI Tasks across Edge-Cloud
by: Kim, Minseo, et al.
Published: (2024)
by: Kim, Minseo, et al.
Published: (2024)
A Case Study of API Design for Interoperability and Security of the Internet of Things
by: Kim, Dongha, et al.
Published: (2024)
by: Kim, Dongha, et al.
Published: (2024)
Déjà Vu: Efficient Video-Language Query Engine with Learning-based Inter-Frame Computation Reuse
by: Hwang, Jinwoo, et al.
Published: (2025)
by: Hwang, Jinwoo, et al.
Published: (2025)
OASIS: Object-based Analytics Storage for Intelligent SQL Query Offloading in Scientific Tabular Workloads
by: Hwang, Soon, et al.
Published: (2025)
by: Hwang, Soon, et al.
Published: (2025)
Revisiting Early-Learning Regularization When Federated Learning Meets Noisy Labels
by: Kim, Taehyeon, et al.
Published: (2024)
by: Kim, Taehyeon, et al.
Published: (2024)
Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications
by: Lee, Seonho, et al.
Published: (2025)
by: Lee, Seonho, et al.
Published: (2025)
Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
by: Kim, Taeyoon, et al.
Published: (2026)
by: Kim, Taeyoon, et al.
Published: (2026)
DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization
by: An, Hyeonjun, et al.
Published: (2026)
by: An, Hyeonjun, et al.
Published: (2026)
Traversal Learning: A Lossless And Efficient Distributed Learning Framework
by: Batbaatar, Erdenebileg, et al.
Published: (2025)
by: Batbaatar, Erdenebileg, et al.
Published: (2025)
PIM-SHERPA: Software Method for On-device LLM Inference by Resolving PIM Memory Attribute and Layout Inconsistencies
by: Lee, Sunjung, et al.
Published: (2026)
by: Lee, Sunjung, et al.
Published: (2026)
Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony
by: Wang, Shaoyu, et al.
Published: (2025)
by: Wang, Shaoyu, et al.
Published: (2025)
DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference
by: Jeong, Bodon, et al.
Published: (2026)
by: Jeong, Bodon, et al.
Published: (2026)
Eva: Cost-Efficient Cloud-Based Cluster Scheduling
by: Chang, Tzu-Tao, et al.
Published: (2025)
by: Chang, Tzu-Tao, et al.
Published: (2025)
TraCT: Disaggregated LLM Serving with CXL Shared Memory KV Cache at Rack-Scale
by: Yoon, Dongha, et al.
Published: (2025)
by: Yoon, Dongha, et al.
Published: (2025)
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
by: Oh, Hyungjun, et al.
Published: (2024)
by: Oh, Hyungjun, et al.
Published: (2024)
Flex-MIG: Enabling Distributed Execution on MIG
by: Kim, Myeongsu, et al.
Published: (2025)
by: Kim, Myeongsu, et al.
Published: (2025)
MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems
by: Kwon, Miryeong, et al.
Published: (2025)
by: Kwon, Miryeong, et al.
Published: (2025)
AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)
by: Park, Seongyeon, et al.
Published: (2024)
HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning
by: Kim, Gyudong, et al.
Published: (2024)
by: Kim, Gyudong, et al.
Published: (2024)
Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent Partitions
by: Kim, Song-Ju
Published: (2026)
by: Kim, Song-Ju
Published: (2026)
Similar Items
-
Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading
by: Kim, Kihyun, et al.
Published: (2025) -
Shared Disk KV Cache Management for Efficient Multi-Instance Inference in RAG-Powered LLMs
by: Lee, Hyungwoo, et al.
Published: (2025) -
SpotVista: Availability-Aware Recommendation System for Reliable and Cost-Efficient Multi-Node Spot Instances
by: Kim, Taeyoon, et al.
Published: (2026) -
KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances
by: Kim, Taeyoon, et al.
Published: (2026) -
FCDP: Fully Cached Data Parallel for Communication-Avoiding Large-Scale Training
by: Park, Gyeongseo, et al.
Published: (2026)