Saved in:
| Main Authors: | Jali, Neharika, Qu, Guannan, Wang, Weina, Joshi, Gauri |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01147 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Natural Policy Gradient for Average Reward Non-Stationary RL
by: Jali, Neharika, et al.
Published: (2025)
by: Jali, Neharika, et al.
Published: (2025)
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
by: Jali, Neharika, et al.
Published: (2026)
by: Jali, Neharika, et al.
Published: (2026)
ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
by: Patel, Shivam, et al.
Published: (2025)
by: Patel, Shivam, et al.
Published: (2025)
Erasure Coded Neural Network Inference via Fisher Averaging
by: Jhunjhunwala, Divyansh, et al.
Published: (2024)
by: Jhunjhunwala, Divyansh, et al.
Published: (2024)
The Transient Cost of Learning in Queueing Systems
by: Freund, Daniel, et al.
Published: (2023)
by: Freund, Daniel, et al.
Published: (2023)
An Upper Bound on the M/M/k Queue With Deterministic Setup Times
by: Williams, Jalani, et al.
Published: (2025)
by: Williams, Jalani, et al.
Published: (2025)
GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
by: Ziller, Thomas, et al.
Published: (2026)
by: Ziller, Thomas, et al.
Published: (2026)
Tabular and Deep Reinforcement Learning for Gittins Index
by: Dhankhar, Harshit, et al.
Published: (2024)
by: Dhankhar, Harshit, et al.
Published: (2024)
MQ-GNN: A Multi-Queue Pipelined Architecture for Scalable and Efficient GNN Training
by: Ullah, Irfan, et al.
Published: (2026)
by: Ullah, Irfan, et al.
Published: (2026)
Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks
by: Gardner, Jason, et al.
Published: (2025)
by: Gardner, Jason, et al.
Published: (2025)
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
by: Zheng, Wenhao, et al.
Published: (2025)
by: Zheng, Wenhao, et al.
Published: (2025)
SAfEPaTh: A System-Level Approach for Efficient Power and Thermal Estimation of Convolutional Neural Network Accelerator
by: Chen, Yukai, et al.
Published: (2024)
by: Chen, Yukai, et al.
Published: (2024)
DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
by: Wang, Liangyu, et al.
Published: (2025)
by: Wang, Liangyu, et al.
Published: (2025)
Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency
by: Raj, Akhilesh, et al.
Published: (2026)
by: Raj, Akhilesh, et al.
Published: (2026)
Efficient GPU implementation of randomized SVD and its applications
by: Struski, Łukasz, et al.
Published: (2021)
by: Struski, Łukasz, et al.
Published: (2021)
Anatomizing Deep Learning Inference in Web Browsers
by: Wang, Qipeng, et al.
Published: (2024)
by: Wang, Qipeng, et al.
Published: (2024)
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
Heavy-traffic Optimality of Skip-the-Longest-Queues in Heterogeneous Service Systems
by: Luo, Yishun, et al.
Published: (2025)
by: Luo, Yishun, et al.
Published: (2025)
Blending Learning to Rank and Dense Representations for Efficient and Effective Cascades
by: Nardini, Franco Maria, et al.
Published: (2025)
by: Nardini, Franco Maria, et al.
Published: (2025)
Asymptotically Optimal Scheduling of Multiple Parallelizable Job Classes
by: Berg, Benjamin, et al.
Published: (2024)
by: Berg, Benjamin, et al.
Published: (2024)
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
by: Zhang, Yijia, et al.
Published: (2024)
by: Zhang, Yijia, et al.
Published: (2024)
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)
by: Xue, Leyang, et al.
Published: (2024)
Efficient Graph Knowledge Distillation from GNNs to Kolmogorov--Arnold Networks via Self-Attention Dynamic Sampling
by: Cui, Can, et al.
Published: (2025)
by: Cui, Can, et al.
Published: (2025)
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems
by: Oztop, Beste, et al.
Published: (2026)
by: Oztop, Beste, et al.
Published: (2026)
ALERT: Accurate Learning for Energy and Timeliness
by: Wan, Chengcheng, et al.
Published: (2019)
by: Wan, Chengcheng, et al.
Published: (2019)
VDTuner: Automated Performance Tuning for Vector Data Management Systems
by: Yang, Tiannuo, et al.
Published: (2024)
by: Yang, Tiannuo, et al.
Published: (2024)
Forecasting GPU Performance for Deep Learning Training and Inference
by: Lee, Seonho, et al.
Published: (2024)
by: Lee, Seonho, et al.
Published: (2024)
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
by: Duan, Shukai, et al.
Published: (2024)
by: Duan, Shukai, et al.
Published: (2024)
FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models
by: Shao, Zishan, et al.
Published: (2025)
by: Shao, Zishan, et al.
Published: (2025)
PrETi: Predicting Execution Time in Early Stage with LLVM and Machine Learning
by: Xu, Risheng, et al.
Published: (2025)
by: Xu, Risheng, et al.
Published: (2025)
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU
by: McInroe, Trevor, et al.
Published: (2025)
by: McInroe, Trevor, et al.
Published: (2025)
oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
by: Li, Jianhui, et al.
Published: (2023)
by: Li, Jianhui, et al.
Published: (2023)
Machine Learning Models for Reinforced Concrete Pipes Condition Prediction: The State-of-the-Art Using Artificial Neural Networks and Multiple Linear Regression in a Wisconsin Case Study
by: Mohammadagha, Mohsen, et al.
Published: (2025)
by: Mohammadagha, Mohsen, et al.
Published: (2025)
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
by: Fu, Tianyu, et al.
Published: (2025)
by: Fu, Tianyu, et al.
Published: (2025)
An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference
by: Yao, Feiyu, et al.
Published: (2026)
by: Yao, Feiyu, et al.
Published: (2026)
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference
by: Xu, Guanyu, et al.
Published: (2025)
by: Xu, Guanyu, et al.
Published: (2025)
ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
by: Wang, Liangyu, et al.
Published: (2025)
by: Wang, Liangyu, et al.
Published: (2025)
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
by: Sun, Wenbo, et al.
Published: (2024)
by: Sun, Wenbo, et al.
Published: (2024)
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
Distilled Neural Networks for Efficient Learning to Rank
by: Nardini, F. M., et al.
Published: (2022)
by: Nardini, F. M., et al.
Published: (2022)
Similar Items
-
Natural Policy Gradient for Average Reward Non-Stationary RL
by: Jali, Neharika, et al.
Published: (2025) -
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
by: Jali, Neharika, et al.
Published: (2026) -
ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
by: Patel, Shivam, et al.
Published: (2025) -
Erasure Coded Neural Network Inference via Fisher Averaging
by: Jhunjhunwala, Divyansh, et al.
Published: (2024) -
The Transient Cost of Learning in Queueing Systems
by: Freund, Daniel, et al.
Published: (2023)