:: Library Catalog

Imagem da capa

Na minha lista:

Detalhes bibliográficos
Main Authors:	Le, Truong-Thanh, La, Hoang-Loc, Taherkordi, Amir, Eliassen, Frank, and, Phuong Hoai Ha, Guan, Peiyuan
Formato:	Preprint
Publicado em:	2026
Assuntos:	Performance
Acesso em linha:	https://arxiv.org/abs/2603.00549
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

Registos relacionados

Kernel-Level Energy-Efficient Neural Architecture Search for Tabular Dataset
Por: La, Hoang-Loc, et al.
Publicado em: (2025)

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Por: Jain, Rishabh, et al.
Publicado em: (2024)

Performance of Confidential Computing GPUs
Por: Ibarra, Antonio Martínez, et al.
Publicado em: (2025)

Ecomap: Sustainability-Driven Optimization of Multi-Tenant DNN Execution on Edge Servers
Por: Paramanayakam, Varatheepan, et al.
Publicado em: (2025)

Fast Entropy Decoding for Sparse MVM on GPUs
Por: Schätzle, Emil, et al.
Publicado em: (2026)

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips
Por: Dagli, Ismet, et al.
Publicado em: (2023)

Motion-to-Motion Latency Measurement Framework for Connected and Autonomous Vehicle Teleoperation
Por: Provost, François, et al.
Publicado em: (2025)

SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs
Por: Zhang, Yongkang, et al.
Publicado em: (2024)

Benchmarking GPUs on SVBRDF Extractor Model
Por: Kandel, Narayan, et al.
Publicado em: (2023)

A high-performance and portable implementation of the SISSO method for CPUs and GPUs
Por: Eibl, Sebastian, et al.
Publicado em: (2025)

Automated PMC-based Power Modeling Methodology for Modern Mobile GPUs
Por: Dash, Pranab, et al.
Publicado em: (2024)

Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes
Por: Hendria, Willy Fitra
Publicado em: (2026)

CarbonCP: Carbon-Aware DNN Partitioning with Conformal Prediction for Sustainable Edge Intelligence
Por: Ke, Hongyu, et al.
Publicado em: (2024)

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
Por: Li, Jianhui, et al.
Publicado em: (2023)

AFarePart: Accuracy-aware Fault-resilient Partitioner for DNN Edge Accelerators
Por: Debnath, Mukta, et al.
Publicado em: (2025)

How to Rent GPUs on a Budget
Por: Li, Zhouzi, et al.
Publicado em: (2024)

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC
Por: Shen, Siyuan, et al.
Publicado em: (2025)

Latency and Privacy-Aware Resource Allocation in Vehicular Edge Computing
Por: Ahmadvand, Hossein, et al.
Publicado em: (2025)

Opening the Black Box: Performance Estimation during Code Generation for GPUs
Por: Ernst, Dominik, et al.
Publicado em: (2021)

Long-term Monitoring of Kernel and Hardware Events to Understand Latency Variance
Por: Zhou, Fang, et al.
Publicado em: (2026)

PrETi: Predicting Execution Time in Early Stage with LLVM and Machine Learning
Por: Xu, Risheng, et al.
Publicado em: (2025)

FRSZ2 for In-Register Block Compression Inside GMRES on GPUs
Por: Grützmacher, Thomas, et al.
Publicado em: (2024)

DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs
Por: Liu, Jiahui, et al.
Publicado em: (2024)

Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs
Por: Liu, Yi
Publicado em: (2026)

SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts--Extended Version
Por: Pham, Nghiem Thanh, et al.
Publicado em: (2025)

LLMPerf: GPU Performance Modeling meets Large Language Models
Por: Nguyen, Khoi N. M., et al.
Publicado em: (2025)

Latency Based Tiling
Por: Cashman, Jack
Publicado em: (2025)

A Latency-Constrained, Gated Recurrent Unit (GRU) Implementation in the Versal AI Engine
Por: Sapkas, M., et al.
Publicado em: (2025)

CARINA: Carbon-Aware Execution of Recurrent Industrial Analytics
Por: Farooq, Muhammad Umar
Publicado em: (2026)

Accurate and Scalable Many-Node Simulation
Por: Eyerman, Stijn, et al.
Publicado em: (2024)

Enhancing Tropical Cyclone Path Forecasting with an Improved Transformer Network
Por: Van Thanh, Nguyen, et al.
Publicado em: (2025)

Variational autoencoder-based neural network model compression
Por: Cheng, Liang, et al.
Publicado em: (2024)

Characterizing and Understanding HGNN Training on GPUs
Por: Han, Dengke, et al.
Publicado em: (2024)

An Experimental Study of Low-Latency Video Streaming over 5G
Por: Khan, Imran, et al.
Publicado em: (2024)

An Interpretable Latency Model for Speculative Decoding in LLM Serving
Por: Kong, Linghao, et al.
Publicado em: (2026)

Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs
Por: Georganas, Evangelos, et al.
Publicado em: (2025)

GROMACS Unplugged: How Power Capping and Frequency Shapes Performance on GPUs
Por: Afzal, Ayesha, et al.
Publicado em: (2025)

RAVE: RISC-V Analyzer of Vector Executions, a QEMU tracing plugin
Por: Vizcaino, Pablo, et al.
Publicado em: (2024)

Feature Optimization for Time Series Forecasting via Novel Randomized Uphill Climbing
Por: Van Thanh, Nguyen
Publicado em: (2025)

ZERNIPAX: A Fast and Accurate Zernike Polynomial Calculator in Python
Por: Elmacioglu, Yigit Gunsur, et al.
Publicado em: (2024)