Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tarride, Solène, Kermorvant, Christopher
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2404.19317
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917654274506752
author	Tarride, Solène Kermorvant, Christopher
author_facet	Tarride, Solène Kermorvant, Christopher
contents	In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition. We evaluate two prominent neural network architectures, PyLaia and DAN, with and without the integration of explicit n-gram language models. Our experiments on three datasets - IAM, RIMES, and NorHand v2 - at both line and page level, investigate optimal parameters for n-gram models, including their order, weight, smoothing methods and tokenization level. The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets, challenging the notion that deep learning models alone are sufficient for optimal performance. In particular, the combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_19317
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition Tarride, Solène Kermorvant, Christopher Computer Vision and Pattern Recognition Computation and Language In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition. We evaluate two prominent neural network architectures, PyLaia and DAN, with and without the integration of explicit n-gram language models. Our experiments on three datasets - IAM, RIMES, and NorHand v2 - at both line and page level, investigate optimal parameters for n-gram models, including their order, weight, smoothing methods and tokenization level. The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets, challenging the notion that deep learning models alone are sufficient for optimal performance. In particular, the combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.
title	Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition
topic	Computer Vision and Pattern Recognition Computation and Language
url	https://arxiv.org/abs/2404.19317

Similar Items