Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Lu, Janna
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2507.04562
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908478323294208
author	Lu, Janna
author_facet	Lu, Janna
contents	Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against top forecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of experts.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_04562
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Evaluating LLMs on Real-World Forecasting Against Expert Forecasters Lu, Janna Machine Learning Artificial Intelligence Computation and Language Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against top forecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of experts.
title	Evaluating LLMs on Real-World Forecasting Against Expert Forecasters
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2507.04562

Similar Items