Saved in:
| Main Author: | Kenney, Matthew |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.22553 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
by: Gan, Eric, et al.
Published: (2026)
by: Gan, Eric, et al.
Published: (2026)
Benchmarking Edge AI Platforms for High-Performance ML Inference
by: Jayanth, Rakshith, et al.
Published: (2024)
by: Jayanth, Rakshith, et al.
Published: (2024)
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026)
by: Atinafu, Yonas, et al.
Published: (2026)
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
by: Chung, Jae-Won, et al.
Published: (2025)
by: Chung, Jae-Won, et al.
Published: (2025)
Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications
by: Banbury, Colby, et al.
Published: (2024)
by: Banbury, Colby, et al.
Published: (2024)
Towards Adaptive ML Benchmarks: Web-Agent-Driven Construction, Domain Expansion, and Metric Optimization
by: Jia, Hangyi, et al.
Published: (2025)
by: Jia, Hangyi, et al.
Published: (2025)
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
by: Yang, Xiao-Wen, et al.
Published: (2025)
by: Yang, Xiao-Wen, et al.
Published: (2025)
Design Principles for Falsifiable, Replicable and Reproducible Empirical ML Research
by: Vranješ, Daniel, et al.
Published: (2024)
by: Vranješ, Daniel, et al.
Published: (2024)
Codenames as a Benchmark for Large Language Models
by: Stephenson, Matthew, et al.
Published: (2024)
by: Stephenson, Matthew, et al.
Published: (2024)
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI
by: Pricope, Tidor-Vlad
Published: (2025)
by: Pricope, Tidor-Vlad
Published: (2025)
Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets
by: Belkhiter, Yannis, et al.
Published: (2025)
by: Belkhiter, Yannis, et al.
Published: (2025)
AdaptoML-UX: An Adaptive User-centered GUI-based AutoML Toolkit for Non-AI Experts and HCI Researchers
by: Gomaa, Amr, et al.
Published: (2024)
by: Gomaa, Amr, et al.
Published: (2024)
ML-Tool-Bench: Tool-Augmented Planning for ML Tasks
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
ZeroML: A Next Generation AutoML Language
by: Mahmud, Monirul Islam
Published: (2025)
by: Mahmud, Monirul Islam
Published: (2025)
More Questions than Answers? Lessons from Integrating Explainable AI into a Cyber-AI Tool
by: Suh, Ashley, et al.
Published: (2024)
by: Suh, Ashley, et al.
Published: (2024)
pAI/MSc: ML Theory Research with Humans on the Loop
by: Abdelmoneum, Mahmoud, et al.
Published: (2026)
by: Abdelmoneum, Mahmoud, et al.
Published: (2026)
AutoML Systems For Medical Imaging
by: Jidney, Tasmia Tahmida, et al.
Published: (2023)
by: Jidney, Tasmia Tahmida, et al.
Published: (2023)
Benchmark Transparency: Measuring the Impact of Data on Evaluation
by: Kovatchev, Venelin, et al.
Published: (2024)
by: Kovatchev, Venelin, et al.
Published: (2024)
Towards Knowledgeable Deep Research: Framework and Benchmark
by: Liu, Wenxuan, et al.
Published: (2026)
by: Liu, Wenxuan, et al.
Published: (2026)
ML-Dev-Bench: Comparative Analysis of AI Agents on ML development workflows
by: Padigela, Harshith, et al.
Published: (2025)
by: Padigela, Harshith, et al.
Published: (2025)
LoCoML: A Framework for Real-World ML Inference Pipelines
by: Maddireddy, Kritin, et al.
Published: (2025)
by: Maddireddy, Kritin, et al.
Published: (2025)
CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance
by: Wen, Wei, et al.
Published: (2024)
by: Wen, Wei, et al.
Published: (2024)
Certified ML Object Detection for Surveillance Missions
by: Belcaid, Mohammed, et al.
Published: (2024)
by: Belcaid, Mohammed, et al.
Published: (2024)
Implementation of airborne ML models with semantics preservation
by: Valot, Nicolas, et al.
Published: (2025)
by: Valot, Nicolas, et al.
Published: (2025)
SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models
by: Ohayon, Hillel, et al.
Published: (2026)
by: Ohayon, Hillel, et al.
Published: (2026)
Integrating Random Forests and Generalized Linear Models for Improved Accuracy and Interpretability
by: Agarwal, Abhineet, et al.
Published: (2023)
by: Agarwal, Abhineet, et al.
Published: (2023)
Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries
by: Çolhak, Furkan, et al.
Published: (2025)
by: Çolhak, Furkan, et al.
Published: (2025)
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
by: Wang, Jiayu, et al.
Published: (2025)
by: Wang, Jiayu, et al.
Published: (2025)
DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
by: Wu, Tongzhou, et al.
Published: (2026)
by: Wu, Tongzhou, et al.
Published: (2026)
Redefining Finance: The Influence of Artificial Intelligence (AI) and Machine Learning (ML)
by: Kumar, Animesh
Published: (2024)
by: Kumar, Animesh
Published: (2024)
NarraBench: A Comprehensive Framework for Narrative Benchmarking
by: Hamilton, Sil, et al.
Published: (2025)
by: Hamilton, Sil, et al.
Published: (2025)
QMBench: A Research Level Benchmark for Quantum Materials Research
by: Wang, Yanzhen, et al.
Published: (2025)
by: Wang, Yanzhen, et al.
Published: (2025)
Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP
by: Francis, Jiztom Kavalakkatt, et al.
Published: (2025)
by: Francis, Jiztom Kavalakkatt, et al.
Published: (2025)
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
by: Pan, Haining, et al.
Published: (2025)
by: Pan, Haining, et al.
Published: (2025)
Problem-oriented AutoML in Clustering
by: da Silva, Matheus Camilo, et al.
Published: (2024)
by: da Silva, Matheus Camilo, et al.
Published: (2024)
nanoML for Human Activity Recognition
by: Bacellar, Alan T. L., et al.
Published: (2025)
by: Bacellar, Alan T. L., et al.
Published: (2025)
Evaluating the printability of stl files with ML
by: Henn, Janik, et al.
Published: (2025)
by: Henn, Janik, et al.
Published: (2025)
Optimizing ML Training with Metagradient Descent
by: Engstrom, Logan, et al.
Published: (2025)
by: Engstrom, Logan, et al.
Published: (2025)
SCUBA: Salesforce Computer Use Benchmark
by: Dai, Yutong, et al.
Published: (2025)
by: Dai, Yutong, et al.
Published: (2025)
Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics
by: Lyu, Zicheng, et al.
Published: (2026)
by: Lyu, Zicheng, et al.
Published: (2026)
Similar Items
-
Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
by: Gan, Eric, et al.
Published: (2026) -
Benchmarking Edge AI Platforms for High-Performance ML Inference
by: Jayanth, Rakshith, et al.
Published: (2024) -
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026) -
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
by: Chung, Jae-Won, et al.
Published: (2025) -
Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications
by: Banbury, Colby, et al.
Published: (2024)