Na minha lista:
| Main Authors: | Le, Truong-Thanh, La, Hoang-Loc, Taherkordi, Amir, Eliassen, Frank, and, Phuong Hoai Ha, Guan, Peiyuan |
|---|---|
| Formato: | Preprint |
| Publicado em: |
2026
|
| Assuntos: | |
| Acesso em linha: | https://arxiv.org/abs/2603.00549 |
| Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
Registos relacionados
Kernel-Level Energy-Efficient Neural Architecture Search for Tabular Dataset
Por: La, Hoang-Loc, et al.
Publicado em: (2025)
Por: La, Hoang-Loc, et al.
Publicado em: (2025)
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Por: Jain, Rishabh, et al.
Publicado em: (2024)
Por: Jain, Rishabh, et al.
Publicado em: (2024)
Performance of Confidential Computing GPUs
Por: Ibarra, Antonio Martínez, et al.
Publicado em: (2025)
Por: Ibarra, Antonio Martínez, et al.
Publicado em: (2025)
Ecomap: Sustainability-Driven Optimization of Multi-Tenant DNN Execution on Edge Servers
Por: Paramanayakam, Varatheepan, et al.
Publicado em: (2025)
Por: Paramanayakam, Varatheepan, et al.
Publicado em: (2025)
Fast Entropy Decoding for Sparse MVM on GPUs
Por: Schätzle, Emil, et al.
Publicado em: (2026)
Por: Schätzle, Emil, et al.
Publicado em: (2026)
Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips
Por: Dagli, Ismet, et al.
Publicado em: (2023)
Por: Dagli, Ismet, et al.
Publicado em: (2023)
Motion-to-Motion Latency Measurement Framework for Connected and Autonomous Vehicle Teleoperation
Por: Provost, François, et al.
Publicado em: (2025)
Por: Provost, François, et al.
Publicado em: (2025)
SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs
Por: Zhang, Yongkang, et al.
Publicado em: (2024)
Por: Zhang, Yongkang, et al.
Publicado em: (2024)
Benchmarking GPUs on SVBRDF Extractor Model
Por: Kandel, Narayan, et al.
Publicado em: (2023)
Por: Kandel, Narayan, et al.
Publicado em: (2023)
A high-performance and portable implementation of the SISSO method for CPUs and GPUs
Por: Eibl, Sebastian, et al.
Publicado em: (2025)
Por: Eibl, Sebastian, et al.
Publicado em: (2025)
Automated PMC-based Power Modeling Methodology for Modern Mobile GPUs
Por: Dash, Pranab, et al.
Publicado em: (2024)
Por: Dash, Pranab, et al.
Publicado em: (2024)
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes
Por: Hendria, Willy Fitra
Publicado em: (2026)
Por: Hendria, Willy Fitra
Publicado em: (2026)
CarbonCP: Carbon-Aware DNN Partitioning with Conformal Prediction for Sustainable Edge Intelligence
Por: Ke, Hongyu, et al.
Publicado em: (2024)
Por: Ke, Hongyu, et al.
Publicado em: (2024)
oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
Por: Li, Jianhui, et al.
Publicado em: (2023)
Por: Li, Jianhui, et al.
Publicado em: (2023)
AFarePart: Accuracy-aware Fault-resilient Partitioner for DNN Edge Accelerators
Por: Debnath, Mukta, et al.
Publicado em: (2025)
Por: Debnath, Mukta, et al.
Publicado em: (2025)
How to Rent GPUs on a Budget
Por: Li, Zhouzi, et al.
Publicado em: (2024)
Por: Li, Zhouzi, et al.
Publicado em: (2024)
EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC
Por: Shen, Siyuan, et al.
Publicado em: (2025)
Por: Shen, Siyuan, et al.
Publicado em: (2025)
Latency and Privacy-Aware Resource Allocation in Vehicular Edge Computing
Por: Ahmadvand, Hossein, et al.
Publicado em: (2025)
Por: Ahmadvand, Hossein, et al.
Publicado em: (2025)
Opening the Black Box: Performance Estimation during Code Generation for GPUs
Por: Ernst, Dominik, et al.
Publicado em: (2021)
Por: Ernst, Dominik, et al.
Publicado em: (2021)
Long-term Monitoring of Kernel and Hardware Events to Understand Latency Variance
Por: Zhou, Fang, et al.
Publicado em: (2026)
Por: Zhou, Fang, et al.
Publicado em: (2026)
PrETi: Predicting Execution Time in Early Stage with LLVM and Machine Learning
Por: Xu, Risheng, et al.
Publicado em: (2025)
Por: Xu, Risheng, et al.
Publicado em: (2025)
FRSZ2 for In-Register Block Compression Inside GMRES on GPUs
Por: Grützmacher, Thomas, et al.
Publicado em: (2024)
Por: Grützmacher, Thomas, et al.
Publicado em: (2024)
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs
Por: Liu, Jiahui, et al.
Publicado em: (2024)
Por: Liu, Jiahui, et al.
Publicado em: (2024)
Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs
Por: Liu, Yi
Publicado em: (2026)
Por: Liu, Yi
Publicado em: (2026)
SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts--Extended Version
Por: Pham, Nghiem Thanh, et al.
Publicado em: (2025)
Por: Pham, Nghiem Thanh, et al.
Publicado em: (2025)
LLMPerf: GPU Performance Modeling meets Large Language Models
Por: Nguyen, Khoi N. M., et al.
Publicado em: (2025)
Por: Nguyen, Khoi N. M., et al.
Publicado em: (2025)
Latency Based Tiling
Por: Cashman, Jack
Publicado em: (2025)
Por: Cashman, Jack
Publicado em: (2025)
A Latency-Constrained, Gated Recurrent Unit (GRU) Implementation in the Versal AI Engine
Por: Sapkas, M., et al.
Publicado em: (2025)
Por: Sapkas, M., et al.
Publicado em: (2025)
CARINA: Carbon-Aware Execution of Recurrent Industrial Analytics
Por: Farooq, Muhammad Umar
Publicado em: (2026)
Por: Farooq, Muhammad Umar
Publicado em: (2026)
Accurate and Scalable Many-Node Simulation
Por: Eyerman, Stijn, et al.
Publicado em: (2024)
Por: Eyerman, Stijn, et al.
Publicado em: (2024)
Enhancing Tropical Cyclone Path Forecasting with an Improved Transformer Network
Por: Van Thanh, Nguyen, et al.
Publicado em: (2025)
Por: Van Thanh, Nguyen, et al.
Publicado em: (2025)
Variational autoencoder-based neural network model compression
Por: Cheng, Liang, et al.
Publicado em: (2024)
Por: Cheng, Liang, et al.
Publicado em: (2024)
Characterizing and Understanding HGNN Training on GPUs
Por: Han, Dengke, et al.
Publicado em: (2024)
Por: Han, Dengke, et al.
Publicado em: (2024)
An Experimental Study of Low-Latency Video Streaming over 5G
Por: Khan, Imran, et al.
Publicado em: (2024)
Por: Khan, Imran, et al.
Publicado em: (2024)
An Interpretable Latency Model for Speculative Decoding in LLM Serving
Por: Kong, Linghao, et al.
Publicado em: (2026)
Por: Kong, Linghao, et al.
Publicado em: (2026)
Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs
Por: Georganas, Evangelos, et al.
Publicado em: (2025)
Por: Georganas, Evangelos, et al.
Publicado em: (2025)
GROMACS Unplugged: How Power Capping and Frequency Shapes Performance on GPUs
Por: Afzal, Ayesha, et al.
Publicado em: (2025)
Por: Afzal, Ayesha, et al.
Publicado em: (2025)
RAVE: RISC-V Analyzer of Vector Executions, a QEMU tracing plugin
Por: Vizcaino, Pablo, et al.
Publicado em: (2024)
Por: Vizcaino, Pablo, et al.
Publicado em: (2024)
Feature Optimization for Time Series Forecasting via Novel Randomized Uphill Climbing
Por: Van Thanh, Nguyen
Publicado em: (2025)
Por: Van Thanh, Nguyen
Publicado em: (2025)
ZERNIPAX: A Fast and Accurate Zernike Polynomial Calculator in Python
Por: Elmacioglu, Yigit Gunsur, et al.
Publicado em: (2024)
Por: Elmacioglu, Yigit Gunsur, et al.
Publicado em: (2024)
Registos relacionados
-
Kernel-Level Energy-Efficient Neural Architecture Search for Tabular Dataset
Por: La, Hoang-Loc, et al.
Publicado em: (2025) -
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Por: Jain, Rishabh, et al.
Publicado em: (2024) -
Performance of Confidential Computing GPUs
Por: Ibarra, Antonio Martínez, et al.
Publicado em: (2025) -
Ecomap: Sustainability-Driven Optimization of Multi-Tenant DNN Execution on Edge Servers
Por: Paramanayakam, Varatheepan, et al.
Publicado em: (2025) -
Fast Entropy Decoding for Sparse MVM on GPUs
Por: Schätzle, Emil, et al.
Publicado em: (2026)