Saved in:
Bibliographic Details
Main Author: Lu, Janna
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.04562
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908478323294208
author Lu, Janna
author_facet Lu, Janna
contents Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against top forecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of experts.
format Preprint
id arxiv_https___arxiv_org_abs_2507_04562
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Evaluating LLMs on Real-World Forecasting Against Expert Forecasters
Lu, Janna
Machine Learning
Artificial Intelligence
Computation and Language
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against top forecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of experts.
title Evaluating LLMs on Real-World Forecasting Against Expert Forecasters
topic Machine Learning
Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2507.04562