:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Kenney, Matthew
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.22553
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
by: Gan, Eric, et al.
Published: (2026)

Benchmarking Edge AI Platforms for High-Performance ML Inference
by: Jayanth, Rakshith, et al.
Published: (2024)

RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026)

The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
by: Chung, Jae-Won, et al.
Published: (2025)

Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications
by: Banbury, Colby, et al.
Published: (2024)

Towards Adaptive ML Benchmarks: Web-Agent-Driven Construction, Domain Expansion, and Metric Optimization
by: Jia, Hangyi, et al.
Published: (2025)

FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
by: Yang, Xiao-Wen, et al.
Published: (2025)

Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research
by: Vranješ, Daniel, et al.
Published: (2024)

Codenames as a Benchmark for Large Language Models
by: Stephenson, Matthew, et al.
Published: (2024)

HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI
by: Pricope, Tidor-Vlad
Published: (2025)

Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets
by: Belkhiter, Yannis, et al.
Published: (2025)

AdaptoML-UX: An Adaptive User-centered GUI-based AutoML Toolkit for Non-AI Experts and HCI Researchers
by: Gomaa, Amr, et al.
Published: (2024)

ML-Tool-Bench: Tool-Augmented Planning for ML Tasks
by: Chittepu, Yaswanth, et al.
Published: (2025)

ZeroML: A Next Generation AutoML Language
by: Mahmud, Monirul Islam
Published: (2025)

More Questions than Answers? Lessons from Integrating Explainable AI into a Cyber-AI Tool
by: Suh, Ashley, et al.
Published: (2024)

pAI/MSc: ML Theory Research with Humans on the Loop
by: Abdelmoneum, Mahmoud, et al.
Published: (2026)

AutoML Systems For Medical Imaging
by: Jidney, Tasmia Tahmida, et al.
Published: (2023)

Benchmark Transparency: Measuring the Impact of Data on Evaluation
by: Kovatchev, Venelin, et al.
Published: (2024)

Towards Knowledgeable Deep Research: Framework and Benchmark
by: Liu, Wenxuan, et al.
Published: (2026)

ML-Dev-Bench: Comparative Analysis of AI Agents on ML development workflows
by: Padigela, Harshith, et al.
Published: (2025)

LoCoML: A Framework for Real-World ML Inference Pipelines
by: Maddireddy, Kritin, et al.
Published: (2025)

CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance
by: Wen, Wei, et al.
Published: (2024)

Certified ML Object Detection for Surveillance Missions
by: Belcaid, Mohammed, et al.
Published: (2024)

Implementation of airborne ML models with semantics preservation
by: Valot, Nicolas, et al.
Published: (2025)

SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models
by: Ohayon, Hillel, et al.
Published: (2026)

Integrating Random Forests and Generalized Linear Models for Improved Accuracy and Interpretability
by: Agarwal, Abhineet, et al.
Published: (2023)

Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries
by: Çolhak, Furkan, et al.
Published: (2025)

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
by: Wang, Jiayu, et al.
Published: (2025)

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
by: Wu, Tongzhou, et al.
Published: (2026)

Redefining Finance: The Influence of Artificial Intelligence (AI) and Machine Learning (ML)
by: Kumar, Animesh
Published: (2024)

NarraBench: A Comprehensive Framework for Narrative Benchmarking
by: Hamilton, Sil, et al.
Published: (2025)

QMBench: A Research Level Benchmark for Quantum Materials Research
by: Wang, Yanzhen, et al.
Published: (2025)

Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP
by: Francis, Jiztom Kavalakkatt, et al.
Published: (2025)

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
by: Pan, Haining, et al.
Published: (2025)

Problem-oriented AutoML in Clustering
by: da Silva, Matheus Camilo, et al.
Published: (2024)

nanoML for Human Activity Recognition
by: Bacellar, Alan T. L., et al.
Published: (2025)

Evaluating the printability of stl files with ML
by: Henn, Janik, et al.
Published: (2025)

Optimizing ML Training with Metagradient Descent
by: Engstrom, Logan, et al.
Published: (2025)

SCUBA: Salesforce Computer Use Benchmark
by: Dai, Yutong, et al.
Published: (2025)

Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics
by: Lyu, Zicheng, et al.
Published: (2026)