Saved in:
Bibliographic Details
Main Authors: Verhagen, Mark D., Stroebl, Benedikt, Liu, Tiffany, Liu, Lydia T., Salganik, Matthew J.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.03027
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913926315245568
author Verhagen, Mark D.
Stroebl, Benedikt
Liu, Tiffany
Liu, Lydia T.
Salganik, Matthew J.
author_facet Verhagen, Mark D.
Stroebl, Benedikt
Liu, Tiffany
Liu, Lydia T.
Salganik, Matthew J.
contents For over a century, life course researchers have faced a choice between two dominant methodological approaches: qualitative methods that analyze rich data but are constrained to small samples, and quantitative survey-based methods that study larger populations but sacrifice data richness for scale. Two recent technological developments now enable us to imagine a hybrid approach that combines some of the depth of the qualitative approach with the scale of quantitative methods. The first development is the steady rise of ''complex log data,'' behavioral data that is logged for purposes other than research but that can be repurposed to construct rich accounts of people's lives. The second is the emergence of large language models (LLMs) with exceptional pattern recognition capabilities on plain text. In this paper, we take a necessary step toward creating this hybrid approach by developing a flexible procedure to transform complex log data into a textual representation of an individual's life trajectory across multiple domains, over time, and in context. We call this data representation a ''book of life.'' We illustrate the feasibility of our approach by writing over 100 million books of life covering many different facets of life, over time and placed in social context using Dutch population-scale registry data. We open source the book of life toolkit (BOLT), and invite the research community to explore the many potential applications of this approach.
format Preprint
id arxiv_https___arxiv_org_abs_2507_03027
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle The Book of Life approach: Enabling richness and scale for life course research
Verhagen, Mark D.
Stroebl, Benedikt
Liu, Tiffany
Liu, Lydia T.
Salganik, Matthew J.
Computation and Language
For over a century, life course researchers have faced a choice between two dominant methodological approaches: qualitative methods that analyze rich data but are constrained to small samples, and quantitative survey-based methods that study larger populations but sacrifice data richness for scale. Two recent technological developments now enable us to imagine a hybrid approach that combines some of the depth of the qualitative approach with the scale of quantitative methods. The first development is the steady rise of ''complex log data,'' behavioral data that is logged for purposes other than research but that can be repurposed to construct rich accounts of people's lives. The second is the emergence of large language models (LLMs) with exceptional pattern recognition capabilities on plain text. In this paper, we take a necessary step toward creating this hybrid approach by developing a flexible procedure to transform complex log data into a textual representation of an individual's life trajectory across multiple domains, over time, and in context. We call this data representation a ''book of life.'' We illustrate the feasibility of our approach by writing over 100 million books of life covering many different facets of life, over time and placed in social context using Dutch population-scale registry data. We open source the book of life toolkit (BOLT), and invite the research community to explore the many potential applications of this approach.
title The Book of Life approach: Enabling richness and scale for life course research
topic Computation and Language
url https://arxiv.org/abs/2507.03027