Saved in:
Bibliographic Details
Main Authors: Kong, Minwei, Jiang, Chonghe, Qu, Ao, Ouyang, Wenbin, Zeng, Zhaoming, Guo, Xiaotong, Li, Zhekai, Li, Junyi, Fan, Yi, Zheng, Xinshou, Jing, Xi, Zhang, Yikai, Liang, Zhiwei, Kim, Seonghoo, Yang, Runqing, Zhou, Zijian, Li, Sirui, Zheng, Han, Ying, Wangyang, Zheng, Ou, Wang, Chonghuan, Zhao, Jinglong, Qin, Hanzhang, Wu, Cathy, Liang, Paul Pu, Zhao, Jinhua, Wang, Hai
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.25246
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large language models (LLMs) are increasingly used for optimization modeling and solver-code generation, yet practical operations research and optimization problems often require a harder capability: designing scalable algorithms that exploit problem structure and outperform direct formulation-and-solve baselines. Existing benchmarks are limited to small or simplified examples far below real-world scale and complexity. We introduce FrontierOR, among the first benchmarks to systematically evaluate LLM-based efficient algorithm design for realistic large-scale optimization problems. FrontierOR includes 180 tasks derived from methodologically diverse papers published in top-tier operations research venues, each with standardized instances and a hidden, expert-verified evaluation suite. We evaluate seven LLMs spanning frontier, cost-effective, and open-source models both in one-shot and test-time evolution settings. The results reveal that frontier models still struggle to move from executable formulations to efficient optimization algorithms: the strongest one-shot model outperforms Gurobi in only 31% of cases in both solution quality and computational efficiency, and even strong coding agents with test-time evolution achieve only 50% on selected hard tasks. FrontierOR establishes a practical evaluation platform for LLM-based optimization algorithm design, which enables future LLMs and agents to be systematically tested on whether they can move beyond correct formulation toward a feasible, high-quality, and efficient algorithm. Code and data are publicly released at https://github.com/Minw913/FrontierOR.