Saved in:
Bibliographic Details
Main Authors: Zhu, Boyuan, Liu, Fagui, Chen, Xi, Tang, Quan
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2401.11704
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911762231590912
author Zhu, Boyuan
Liu, Fagui
Chen, Xi
Tang, Quan
author_facet Zhu, Boyuan
Liu, Fagui
Chen, Xi
Tang, Quan
contents Recently, scene text detection has received significant attention due to its wide application. However, accurate detection in complex scenes of multiple scales, orientations, and curvature remains a challenge. Numerous detection methods adopt the Vatti clipping (VC) algorithm for multiple-instance training to address the issue of arbitrary-shaped text. Yet we identify several bias results from these approaches called the "shrinked kernel". Specifically, it refers to a decrease in accuracy resulting from an output that overly favors the text kernel. In this paper, we propose a new approach named Expand Kernel Network (EK-Net) with expand kernel distance to compensate for the previous deficiency, which includes three-stages regression to complete instance detection. Moreover, EK-Net not only realize the precise positioning of arbitrary-shaped text, but also achieve a trade-off between performance and speed. Evaluation results demonstrate that EK-Net achieves state-of-the-art or competitive performance compared to other advanced methods, e.g., F-measure of 85.72% at 35.42 FPS on ICDAR 2015, F-measure of 85.75% at 40.13 FPS on CTW1500.
format Preprint
id arxiv_https___arxiv_org_abs_2401_11704
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle EK-Net:Real-time Scene Text Detection with Expand Kernel Distance
Zhu, Boyuan
Liu, Fagui
Chen, Xi
Tang, Quan
Computer Vision and Pattern Recognition
Recently, scene text detection has received significant attention due to its wide application. However, accurate detection in complex scenes of multiple scales, orientations, and curvature remains a challenge. Numerous detection methods adopt the Vatti clipping (VC) algorithm for multiple-instance training to address the issue of arbitrary-shaped text. Yet we identify several bias results from these approaches called the "shrinked kernel". Specifically, it refers to a decrease in accuracy resulting from an output that overly favors the text kernel. In this paper, we propose a new approach named Expand Kernel Network (EK-Net) with expand kernel distance to compensate for the previous deficiency, which includes three-stages regression to complete instance detection. Moreover, EK-Net not only realize the precise positioning of arbitrary-shaped text, but also achieve a trade-off between performance and speed. Evaluation results demonstrate that EK-Net achieves state-of-the-art or competitive performance compared to other advanced methods, e.g., F-measure of 85.72% at 35.42 FPS on ICDAR 2015, F-measure of 85.75% at 40.13 FPS on CTW1500.
title EK-Net:Real-time Scene Text Detection with Expand Kernel Distance
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2401.11704