Saved in:
Bibliographic Details
Main Authors: Jia, Huaiyu, Zhou, Luofeng, Zhang, Wentao, Cong, Lin William, Li, Siguang, Sun, Shuo
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.20421
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915949057146880
author Jia, Huaiyu
Zhou, Luofeng
Zhang, Wentao
Cong, Lin William
Li, Siguang
Sun, Shuo
author_facet Jia, Huaiyu
Zhou, Luofeng
Zhang, Wentao
Cong, Lin William
Li, Siguang
Sun, Shuo
contents Prediction markets are markets for trading claims on future events, such as presidential elections, and their prices provide continuously updated signals of collective beliefs. In decentralized platforms such as Polymarket, the market lifecycle spans market creation, token registration, trading, oracle interaction, dispute, and final settlement, yet the corresponding data are fragmented across heterogeneous off-chain and on-chain sources. We present the first continuously maintained dataset suite for the full lifecycle of decentralized prediction markets, built on Polymarket. To address the challenges of large-scale cross-source integration, incomplete linkage, and continuous synchronization, we build a unified relational data system that integrates three canonical layers: market metadata, fill-level trading records, and oracle-resolution events, through identifier resolution, on-chain recovery, and incremental updates. The resulting dataset spans October 2020 to March 2026 and comprises more than 770 thousand market records, over 943 million fill records, and nearly 2 million oracle events. We describe the data model, collection pipeline, and consistency mechanisms that make the dataset reproducible and extensible, and we demonstrate its utility through descriptive analyses of market activity and two downstream case studies: NBA outcome calibration and CPI expectation reconstruction.
format Preprint
id arxiv_https___arxiv_org_abs_2604_20421
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]
Jia, Huaiyu
Zhou, Luofeng
Zhang, Wentao
Cong, Lin William
Li, Siguang
Sun, Shuo
Machine Learning
Prediction markets are markets for trading claims on future events, such as presidential elections, and their prices provide continuously updated signals of collective beliefs. In decentralized platforms such as Polymarket, the market lifecycle spans market creation, token registration, trading, oracle interaction, dispute, and final settlement, yet the corresponding data are fragmented across heterogeneous off-chain and on-chain sources. We present the first continuously maintained dataset suite for the full lifecycle of decentralized prediction markets, built on Polymarket. To address the challenges of large-scale cross-source integration, incomplete linkage, and continuous synchronization, we build a unified relational data system that integrates three canonical layers: market metadata, fill-level trading records, and oracle-resolution events, through identifier resolution, on-chain recovery, and incremental updates. The resulting dataset spans October 2020 to March 2026 and comprises more than 770 thousand market records, over 943 million fill records, and nearly 2 million oracle events. We describe the data model, collection pipeline, and consistency mechanisms that make the dataset reproducible and extensible, and we demonstrate its utility through descriptive analyses of market activity and two downstream case studies: NBA outcome calibration and CPI expectation reconstruction.
title Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]
topic Machine Learning
url https://arxiv.org/abs/2604.20421