Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Greydanus, Sam, Wimpee, Zachary
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2504.00051
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908292387700736
author	Greydanus, Sam Wimpee, Zachary
author_facet	Greydanus, Sam Wimpee, Zachary
contents	Transformers trained on tokenized text, audio, and images can generate high-quality autoregressive samples. But handwriting data, represented as sequences of pen coordinates, remains underexplored. We introduce a novel tokenization scheme that converts pen stroke offsets to polar coordinates, discretizes them into bins, and then turns them into sequences of tokens with which to train a standard GPT model. This allows us to capture complex stroke distributions without using any specialized architectures (eg. the mixture density network or the self-advancing ASCII attention head from Graves 2014). With just 3,500 handwritten words and a few simple data augmentations, we are able to train a model that can generate realistic cursive handwriting. Our approach is simpler and more performant than previous RNN-based methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_00051
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The Cursive Transformer Greydanus, Sam Wimpee, Zachary Machine Learning Artificial Intelligence Computation and Language Transformers trained on tokenized text, audio, and images can generate high-quality autoregressive samples. But handwriting data, represented as sequences of pen coordinates, remains underexplored. We introduce a novel tokenization scheme that converts pen stroke offsets to polar coordinates, discretizes them into bins, and then turns them into sequences of tokens with which to train a standard GPT model. This allows us to capture complex stroke distributions without using any specialized architectures (eg. the mixture density network or the self-advancing ASCII attention head from Graves 2014). With just 3,500 handwritten words and a few simple data augmentations, we are able to train a model that can generate realistic cursive handwriting. Our approach is simpler and more performant than previous RNN-based methods.
title	The Cursive Transformer
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2504.00051

Similar Items