Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Chang, Da, Li, Yu
Format:	Preprint
Publié:	2024
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2404.12734
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866915278726627328
author	Chang, Da Li, Yu
author_facet	Chang, Da Li, Yu
contents	With the rapid development of OCR technology, mixed-scene text recognition has become a key technical challenge. Although deep learning models have achieved significant results in specific scenarios, their generality and stability still need improvement, and the high demand for computing resources affects flexibility. To address these issues, this paper proposes DLoRA-TrOCR, a parameter-efficient hybrid text spotting method based on a pre-trained OCR Transformer. By embedding a weight-decomposed DoRA module in the image encoder and a LoRA module in the text decoder, this method can be efficiently fine-tuned on various downstream tasks. Our method requires no more than 0.7\% trainable parameters, not only accelerating the training efficiency but also significantly improving the recognition accuracy and cross-dataset generalization performance of the OCR system in mixed text scenes. Experiments show that our proposed DLoRA-TrOCR outperforms other parameter-efficient fine-tuning methods in recognizing complex scenes with mixed handwritten, printed, and street text, achieving a CER of 4.02 on the IAM dataset, a F1 score of 94.29 on the SROIE dataset, and a WAR of 86.70 on the STR Benchmark, reaching state-of-the-art performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_12734
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer Chang, Da Li, Yu Computer Vision and Pattern Recognition With the rapid development of OCR technology, mixed-scene text recognition has become a key technical challenge. Although deep learning models have achieved significant results in specific scenarios, their generality and stability still need improvement, and the high demand for computing resources affects flexibility. To address these issues, this paper proposes DLoRA-TrOCR, a parameter-efficient hybrid text spotting method based on a pre-trained OCR Transformer. By embedding a weight-decomposed DoRA module in the image encoder and a LoRA module in the text decoder, this method can be efficiently fine-tuned on various downstream tasks. Our method requires no more than 0.7\% trainable parameters, not only accelerating the training efficiency but also significantly improving the recognition accuracy and cross-dataset generalization performance of the OCR system in mixed text scenes. Experiments show that our proposed DLoRA-TrOCR outperforms other parameter-efficient fine-tuning methods in recognizing complex scenes with mixed handwritten, printed, and street text, achieving a CER of 4.02 on the IAM dataset, a F1 score of 94.29 on the SROIE dataset, and a WAR of 86.70 on the STR Benchmark, reaching state-of-the-art performance.
title	Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.12734

Documents similaires