Saved in:
Bibliographic Details
Main Authors: Yuan, Dehao, Farnan, Tyler, Tesliuc, Stefan, Bergman, Doron L, Wu, Yulun, Liu, Xiaoyu, Liu, Minghui, Montgomery, James, Nguyen, Nam H, Bruss, C. Bayan, Huang, Furong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.03149
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914268281044992
author Yuan, Dehao
Farnan, Tyler
Tesliuc, Stefan
Bergman, Doron L
Wu, Yulun
Liu, Xiaoyu
Liu, Minghui
Montgomery, James
Nguyen, Nam H
Bruss, C. Bayan
Huang, Furong
author_facet Yuan, Dehao
Farnan, Tyler
Tesliuc, Stefan
Bergman, Doron L
Wu, Yulun
Liu, Xiaoyu
Liu, Minghui
Montgomery, James
Nguyen, Nam H
Bruss, C. Bayan
Huang, Furong
contents Strict privacy regulations limit access to real transaction data, slowing open research in financial AI. Synthetic data can bridge this gap, but existing generators do not jointly achieve behavioral diversity and logical groundedness. Rule-driven simulators rely on hand-crafted workflows and shallow stochasticity, which miss the richness of human behavior. Learning-based generators such as GANs capture correlations yet often violate hard financial constraints and still require training on private data. We introduce PersonaLedger, a generation engine that uses a large language model conditioned on rich user personas to produce diverse transaction streams, coupled with an expert configurable programmatic engine that maintains correctness. The LLM and engine interact in a closed loop: after each event, the engine updates the user state, enforces financial rules, and returns a context aware "nextprompt" that guides the LLM toward feasible next actions. With this engine, we create a public dataset of 30 million transactions from 23,000 users and a benchmark suite with two tasks, illiquidity classification and identity theft segmentation. PersonaLedger offers a realistic, privacy preserving resource that supports rigorous evaluation of forecasting and anomaly detection models. PersonaLedger offers the community a rich, realistic, and privacy preserving resource -- complete with code, rules, and generation logs -- to accelerate innovation in financial AI and enable rigorous, reproducible evaluation.
format Preprint
id arxiv_https___arxiv_org_abs_2601_03149
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle PersonaLedger: Generating Realistic Financial Transactions with Persona Conditioned LLMs and Rule Grounded Feedback
Yuan, Dehao
Farnan, Tyler
Tesliuc, Stefan
Bergman, Doron L
Wu, Yulun
Liu, Xiaoyu
Liu, Minghui
Montgomery, James
Nguyen, Nam H
Bruss, C. Bayan
Huang, Furong
Machine Learning
Strict privacy regulations limit access to real transaction data, slowing open research in financial AI. Synthetic data can bridge this gap, but existing generators do not jointly achieve behavioral diversity and logical groundedness. Rule-driven simulators rely on hand-crafted workflows and shallow stochasticity, which miss the richness of human behavior. Learning-based generators such as GANs capture correlations yet often violate hard financial constraints and still require training on private data. We introduce PersonaLedger, a generation engine that uses a large language model conditioned on rich user personas to produce diverse transaction streams, coupled with an expert configurable programmatic engine that maintains correctness. The LLM and engine interact in a closed loop: after each event, the engine updates the user state, enforces financial rules, and returns a context aware "nextprompt" that guides the LLM toward feasible next actions. With this engine, we create a public dataset of 30 million transactions from 23,000 users and a benchmark suite with two tasks, illiquidity classification and identity theft segmentation. PersonaLedger offers a realistic, privacy preserving resource that supports rigorous evaluation of forecasting and anomaly detection models. PersonaLedger offers the community a rich, realistic, and privacy preserving resource -- complete with code, rules, and generation logs -- to accelerate innovation in financial AI and enable rigorous, reproducible evaluation.
title PersonaLedger: Generating Realistic Financial Transactions with Persona Conditioned LLMs and Rule Grounded Feedback
topic Machine Learning
url https://arxiv.org/abs/2601.03149