:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jali, Neharika, Qu, Guannan, Wang, Weina, Joshi, Gauri
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Performance
Online Access:	https://arxiv.org/abs/2402.01147
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Natural Policy Gradient for Average Reward Non-Stationary RL
by: Jali, Neharika, et al.
Published: (2025)

Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
by: Jali, Neharika, et al.
Published: (2026)

ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
by: Patel, Shivam, et al.
Published: (2025)

Erasure Coded Neural Network Inference via Fisher Averaging
by: Jhunjhunwala, Divyansh, et al.
Published: (2024)

The Transient Cost of Learning in Queueing Systems
by: Freund, Daniel, et al.
Published: (2023)

An Upper Bound on the M/M/k Queue With Deterministic Setup Times
by: Williams, Jalani, et al.
Published: (2025)

GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
by: Ziller, Thomas, et al.
Published: (2026)

Tabular and Deep Reinforcement Learning for Gittins Index
by: Dhankhar, Harshit, et al.
Published: (2024)

MQ-GNN: A Multi-Queue Pipelined Architecture for Scalable and Efficient GNN Training
by: Ullah, Irfan, et al.
Published: (2026)

Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks
by: Gardner, Jason, et al.
Published: (2025)

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
by: Zheng, Wenhao, et al.
Published: (2025)

SAfEPaTh: A System-Level Approach for Efficient Power and Thermal Estimation of Convolutional Neural Network Accelerator
by: Chen, Yukai, et al.
Published: (2024)

DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
by: Wang, Liangyu, et al.
Published: (2025)

Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency
by: Raj, Akhilesh, et al.
Published: (2026)

Efficient GPU implementation of randomized SVD and its applications
by: Struski, Łukasz, et al.
Published: (2021)

Anatomizing Deep Learning Inference in Web Browsers
by: Wang, Qipeng, et al.
Published: (2024)

Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025)

Heavy-traffic Optimality of Skip-the-Longest-Queues in Heterogeneous Service Systems
by: Luo, Yishun, et al.
Published: (2025)

Blending Learning to Rank and Dense Representations for Efficient and Effective Cascades
by: Nardini, Franco Maria, et al.
Published: (2025)

Asymptotically Optimal Scheduling of Multiple Parallelizable Job Classes
by: Berg, Benjamin, et al.
Published: (2024)

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
by: Zhang, Yijia, et al.
Published: (2024)

MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)

Efficient Graph Knowledge Distillation from GNNs to Kolmogorov--Arnold Networks via Self-Attention Dynamic Sampling
by: Cui, Can, et al.
Published: (2025)

A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems
by: Oztop, Beste, et al.
Published: (2026)

ALERT: Accurate Learning for Energy and Timeliness
by: Wan, Chengcheng, et al.
Published: (2019)

VDTuner: Automated Performance Tuning for Vector Data Management Systems
by: Yang, Tiannuo, et al.
Published: (2024)

Forecasting GPU Performance for Deep Learning Training and Inference
by: Lee, Seonho, et al.
Published: (2024)

A Structure-Aware Framework for Learning Device Placements on Computation Graphs
by: Duan, Shukai, et al.
Published: (2024)

FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models
by: Shao, Zishan, et al.
Published: (2025)

PrETi: Predicting Execution Time in Early Stage with LLVM and Machine Learning
by: Xu, Risheng, et al.
Published: (2025)

PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU
by: McInroe, Trevor, et al.
Published: (2025)

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
by: Li, Jianhui, et al.
Published: (2023)

Machine Learning Models for Reinforced Concrete Pipes Condition Prediction: The State-of-the-Art Using Artificial Neural Networks and Multiple Linear Regression in a Wisconsin Case Study
by: Mohammadagha, Mohsen, et al.
Published: (2025)

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
by: Fu, Tianyu, et al.
Published: (2025)

An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference
by: Yao, Feiyu, et al.
Published: (2026)

CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference
by: Xu, Guanyu, et al.
Published: (2025)

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
by: Wang, Liangyu, et al.
Published: (2025)

CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
by: Sun, Wenbo, et al.
Published: (2024)

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)

Distilled Neural Networks for Efficient Learning to Rank
by: Nardini, F. M., et al.
Published: (2022)