Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yuan, Yifei, Søgaard, Anders
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.04421
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929745087692800
author	Yuan, Yifei Søgaard, Anders
author_facet	Yuan, Yifei Søgaard, Anders
contents	Li et al. (2023) used the Othello board game as a test case for the ability of GPT-2 to induce world models, and were followed up by Nanda et al. (2023b). We briefly discuss the original experiments, expanding them to include more language models with more comprehensive probing. Specifically, we analyze sequences of Othello board states and train the model to predict the next move based on previous moves. We evaluate seven language models (GPT-2, T5, Bart, Flan-T5, Mistral, LLaMA-2, and Qwen2.5) on the Othello task and conclude that these models not only learn to play Othello, but also induce the Othello board layout. We find that all models achieve up to 99% accuracy in unsupervised grounding and exhibit high similarity in the board features they learned. This provides considerably stronger evidence for the Othello World Model Hypothesis than previous works.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_04421
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Revisiting the Othello World Model Hypothesis Yuan, Yifei Søgaard, Anders Computation and Language Li et al. (2023) used the Othello board game as a test case for the ability of GPT-2 to induce world models, and were followed up by Nanda et al. (2023b). We briefly discuss the original experiments, expanding them to include more language models with more comprehensive probing. Specifically, we analyze sequences of Othello board states and train the model to predict the next move based on previous moves. We evaluate seven language models (GPT-2, T5, Bart, Flan-T5, Mistral, LLaMA-2, and Qwen2.5) on the Othello task and conclude that these models not only learn to play Othello, but also induce the Othello board layout. We find that all models achieve up to 99% accuracy in unsupervised grounding and exhibit high similarity in the board features they learned. This provides considerably stronger evidence for the Othello World Model Hypothesis than previous works.
title	Revisiting the Othello World Model Hypothesis
topic	Computation and Language
url	https://arxiv.org/abs/2503.04421

Similar Items