Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ren, Xuhua, Shi, Hengcan, Li, Jin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.07518
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917611668766720
author	Ren, Xuhua Shi, Hengcan Li, Jin
author_facet	Ren, Xuhua Shi, Hengcan Li, Jin
contents	Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV training data. To solve this problem, we first propose a pseudo label generation module that leverages character detection and image inpainting to produce substantial pseudo OOV training data from real-world images. Unlike previous synthetic data, our pseudo OOV data contains real characters and backgrounds to simulate real-world applications. Secondly, to reduce noises in pseudo data, we present a semantic checking mechanism to filter semantically meaningful data. Thirdly, we introduce a quality-aware margin loss to boost the training with pseudo data. Our loss includes a margin-based part to enhance the classification ability, and a quality-aware part to penalize low-quality samples in both real and pseudo data. Extensive experiments demonstrate that our approach outperforms the state-of-the-art on eight datasets and achieves the first rank in the ICDAR2022 challenge.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_07518
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss Ren, Xuhua Shi, Hengcan Li, Jin Computer Vision and Pattern Recognition Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV training data. To solve this problem, we first propose a pseudo label generation module that leverages character detection and image inpainting to produce substantial pseudo OOV training data from real-world images. Unlike previous synthetic data, our pseudo OOV data contains real characters and backgrounds to simulate real-world applications. Secondly, to reduce noises in pseudo data, we present a semantic checking mechanism to filter semantically meaningful data. Thirdly, we introduce a quality-aware margin loss to boost the training with pseudo data. Our loss includes a margin-based part to enhance the classification ability, and a quality-aware part to penalize low-quality samples in both real and pseudo data. Extensive experiments demonstrate that our approach outperforms the state-of-the-art on eight datasets and achieves the first rank in the ICDAR2022 challenge.
title	Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.07518

Similar Items