Saved in:
| Main Authors: | Iglovikov, Vladimir, Kosarevsky, Dmitry |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.08731 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Need for Speed: A Comprehensive Benchmark of JPEG Decoders in Python
by: Iglovikov, Vladimir
Published: (2025)
by: Iglovikov, Vladimir
Published: (2025)
U-TOE: Universal TinyML On-board Evaluation Toolkit for Low-Power IoT
by: Huang, Zhaolan, et al.
Published: (2023)
by: Huang, Zhaolan, et al.
Published: (2023)
Accelerating Sparse Ternary GEMM for Quantized ML on Apple Silicon
by: Lipshitz, Baraq, et al.
Published: (2025)
by: Lipshitz, Baraq, et al.
Published: (2025)
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications
by: Leslin, Jelin, et al.
Published: (2025)
by: Leslin, Jelin, et al.
Published: (2025)
Steering Pretrained Drafters during Speculative Decoding
by: Berdoz, Frédéric, et al.
Published: (2025)
by: Berdoz, Frédéric, et al.
Published: (2025)
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework
by: Estevez, Melissa, et al.
Published: (2025)
by: Estevez, Melissa, et al.
Published: (2025)
An Interpretable Latency Model for Speculative Decoding in LLM Serving
by: Kong, Linghao, et al.
Published: (2026)
by: Kong, Linghao, et al.
Published: (2026)
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
by: Huang, Zhaolan, et al.
Published: (2025)
by: Huang, Zhaolan, et al.
Published: (2025)
The Next 700 ML-Enabled Compiler Optimizations
by: VenkataKeerthy, S., et al.
Published: (2023)
by: VenkataKeerthy, S., et al.
Published: (2023)
SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data
by: Lautrup, Anton Danholt, et al.
Published: (2024)
by: Lautrup, Anton Danholt, et al.
Published: (2024)
Benchmarking GPUs on SVBRDF Extractor Model
by: Kandel, Narayan, et al.
Published: (2023)
by: Kandel, Narayan, et al.
Published: (2023)
Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads
by: Karami, Rachid, et al.
Published: (2024)
by: Karami, Rachid, et al.
Published: (2024)
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
by: Sun, Wenbo, et al.
Published: (2024)
by: Sun, Wenbo, et al.
Published: (2024)
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
by: Nasr-Esfahany, Arash, et al.
Published: (2025)
by: Nasr-Esfahany, Arash, et al.
Published: (2025)
Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks
by: Gardner, Jason, et al.
Published: (2025)
by: Gardner, Jason, et al.
Published: (2025)
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis
by: Werner, Elias, et al.
Published: (2023)
by: Werner, Elias, et al.
Published: (2023)
Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling
by: Atmer, Hannah, et al.
Published: (2025)
by: Atmer, Hannah, et al.
Published: (2025)
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask
by: Abraham, Ashley N., et al.
Published: (2026)
by: Abraham, Ashley N., et al.
Published: (2026)
Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data
by: Wang, Can, et al.
Published: (2024)
by: Wang, Can, et al.
Published: (2024)
Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments
by: Fursin, Grigori
Published: (2024)
by: Fursin, Grigori
Published: (2024)
Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept
by: Lansiaux, Edouard, et al.
Published: (2025)
by: Lansiaux, Edouard, et al.
Published: (2025)
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making
by: Amujo, Oluyemi Enoch, et al.
Published: (2024)
by: Amujo, Oluyemi Enoch, et al.
Published: (2024)
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
by: Anurin, Andrey, et al.
Published: (2024)
by: Anurin, Andrey, et al.
Published: (2024)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
by: Israel, Daniel, et al.
Published: (2025)
by: Israel, Daniel, et al.
Published: (2025)
Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers
by: Huang, Zhaolan, et al.
Published: (2025)
by: Huang, Zhaolan, et al.
Published: (2025)
VDTuner: Automated Performance Tuning for Vector Data Management Systems
by: Yang, Tiannuo, et al.
Published: (2024)
by: Yang, Tiannuo, et al.
Published: (2024)
Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process
by: Mahjoub, Youssef Ait El, et al.
Published: (2025)
by: Mahjoub, Youssef Ait El, et al.
Published: (2025)
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs
by: Peng, Hongwu, et al.
Published: (2023)
by: Peng, Hongwu, et al.
Published: (2023)
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics
by: Liu, Jiashuo, et al.
Published: (2025)
by: Liu, Jiashuo, et al.
Published: (2025)
TrainMover: An Interruption-Resilient Runtime for ML Training
by: Lao, ChonLam, et al.
Published: (2024)
by: Lao, ChonLam, et al.
Published: (2024)
ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU
by: Sunesh, Aman, et al.
Published: (2026)
by: Sunesh, Aman, et al.
Published: (2026)
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
by: Liu, Zirui, et al.
Published: (2024)
by: Liu, Zirui, et al.
Published: (2024)
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes
by: Hendria, Willy Fitra
Published: (2026)
by: Hendria, Willy Fitra
Published: (2026)
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
by: Zheng, Wenhao, et al.
Published: (2025)
by: Zheng, Wenhao, et al.
Published: (2025)
Machine Learning Methods for Evaluating Public Crisis: Meta-Analysis
by: Okpala, Izunna, et al.
Published: (2023)
by: Okpala, Izunna, et al.
Published: (2023)
Systematic Evaluation of Optimization Techniques for Long-Context Language Models
by: Ahmed, Ammar, et al.
Published: (2025)
by: Ahmed, Ammar, et al.
Published: (2025)
An MLCommons Scientific Benchmarks Ontology
by: Hawks, Ben, et al.
Published: (2025)
by: Hawks, Ben, et al.
Published: (2025)
Toward A Formalized Approach for Spike Sorting Algorithms and Hardware Evaluation
by: Zhang, Tim, et al.
Published: (2022)
by: Zhang, Tim, et al.
Published: (2022)
Similar Items
-
Need for Speed: A Comprehensive Benchmark of JPEG Decoders in Python
by: Iglovikov, Vladimir
Published: (2025) -
U-TOE: Universal TinyML On-board Evaluation Toolkit for Low-Power IoT
by: Huang, Zhaolan, et al.
Published: (2023) -
Accelerating Sparse Ternary GEMM for Quantized ML on Apple Silicon
by: Lipshitz, Baraq, et al.
Published: (2025) -
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026) -
Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications
by: Leslin, Jelin, et al.
Published: (2025)