:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Iglovikov, Vladimir, Kosarevsky, Dmitry
Format:	Preprint
Published:	2026
Subjects:	Performance Machine Learning
Online Access:	https://arxiv.org/abs/2605.08731
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Need for Speed: A Comprehensive Benchmark of JPEG Decoders in Python
by: Iglovikov, Vladimir
Published: (2025)

U-TOE: Universal TinyML On-board Evaluation Toolkit for Low-Power IoT
by: Huang, Zhaolan, et al.
Published: (2023)

Accelerating Sparse Ternary GEMM for Quantized ML on Apple Silicon
by: Lipshitz, Baraq, et al.
Published: (2025)

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)

Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications
by: Leslin, Jelin, et al.
Published: (2025)

Steering Pretrained Drafters during Speculative Decoding
by: Berdoz, Frédéric, et al.
Published: (2025)

Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework
by: Estevez, Melissa, et al.
Published: (2025)

An Interpretable Latency Model for Speculative Decoding in LLM Serving
by: Kong, Linghao, et al.
Published: (2026)

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
by: Huang, Zhaolan, et al.
Published: (2025)

The Next 700 ML-Enabled Compiler Optimizations
by: VenkataKeerthy, S., et al.
Published: (2023)

SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data
by: Lautrup, Anton Danholt, et al.
Published: (2024)

Benchmarking GPUs on SVBRDF Extractor Model
by: Kandel, Narayan, et al.
Published: (2023)

Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads
by: Karami, Rachid, et al.
Published: (2024)

CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
by: Sun, Wenbo, et al.
Published: (2024)

Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
by: Nasr-Esfahany, Arash, et al.
Published: (2025)

Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks
by: Gardner, Jason, et al.
Published: (2025)

Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis
by: Werner, Elias, et al.
Published: (2023)

Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling
by: Atmer, Hannah, et al.
Published: (2025)

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)

Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask
by: Abraham, Ashley N., et al.
Published: (2026)

Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data
by: Wang, Can, et al.
Published: (2024)

Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments
by: Fursin, Grigori
Published: (2024)

Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept
by: Lansiaux, Edouard, et al.
Published: (2025)

Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making
by: Amujo, Oluyemi Enoch, et al.
Published: (2024)

Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
by: Anurin, Andrey, et al.
Published: (2024)

Accelerating Diffusion LLMs via Adaptive Parallel Decoding
by: Israel, Daniel, et al.
Published: (2025)

Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers
by: Huang, Zhaolan, et al.
Published: (2025)

VDTuner: Automated Performance Tuning for Vector Data Management Systems
by: Yang, Tiannuo, et al.
Published: (2024)

Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process
by: Mahjoub, Youssef Ait El, et al.
Published: (2025)

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs
by: Peng, Hongwu, et al.
Published: (2023)

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics
by: Liu, Jiashuo, et al.
Published: (2025)

TrainMover: An Interruption-Resilient Runtime for ML Training
by: Lao, ChonLam, et al.
Published: (2024)

ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU
by: Sunesh, Aman, et al.
Published: (2026)

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
by: Liu, Zirui, et al.
Published: (2024)

Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes
by: Hendria, Willy Fitra
Published: (2026)

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
by: Zheng, Wenhao, et al.
Published: (2025)

Machine Learning Methods for Evaluating Public Crisis: Meta-Analysis
by: Okpala, Izunna, et al.
Published: (2023)

Systematic Evaluation of Optimization Techniques for Long-Context Language Models
by: Ahmed, Ammar, et al.
Published: (2025)

An MLCommons Scientific Benchmarks Ontology
by: Hawks, Ben, et al.
Published: (2025)

Toward A Formalized Approach for Spike Sorting Algorithms and Hardware Evaluation
by: Zhang, Tim, et al.
Published: (2022)