Saved in:
Bibliographic Details
Main Authors: Sammed S. Admuthe, Hemlata P. Channe
Format: Recurso digital
Language:
Published: Zenodo 2021
Subjects:
Online Access:https://doi.org/10.5281/zenodo.18515979
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866901099923898368
author Sammed S. Admuthe
Hemlata P. Channe
author_facet Sammed S. Admuthe
Hemlata P. Channe
contents A large number of photographs, signatures, and documents are produced, processed, and stored in the form of digital images. Pan Card, Aadhar Card, Passport, Voter Id are the most authentic documents. Classification of these documents is an important step in office automation, digital libraries, and other document image analysis applications. Document classification generally focuses on extracting textual data and using that for feature engineering. Many document image classification methods exists but they are generally used for photographic image classification. In the present work to classify document images, two different techniques are used. The first one is based on textual feature extraction methods TF-IDF (Term Frequency Inverse Term Frequency). The second is visual classification methods by convolution neural network. The dataset contains 600 documented images belonging to 4 different categories. Classification of textual feature extraction methods has given an average accuracy of 85.5% while visual feature classification has given an average accuracy 93%.
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_18515979
institution Zenodo
language
publishDate 2021
publisher Zenodo
record_format zenodo
spellingShingle Document Image Classification using Visual and Textual Features
Sammed S. Admuthe
Hemlata P. Channe
Classification algorithm; Term frequency; Inverse term frequency; Deep learning.
A large number of photographs, signatures, and documents are produced, processed, and stored in the form of digital images. Pan Card, Aadhar Card, Passport, Voter Id are the most authentic documents. Classification of these documents is an important step in office automation, digital libraries, and other document image analysis applications. Document classification generally focuses on extracting textual data and using that for feature engineering. Many document image classification methods exists but they are generally used for photographic image classification. In the present work to classify document images, two different techniques are used. The first one is based on textual feature extraction methods TF-IDF (Term Frequency Inverse Term Frequency). The second is visual classification methods by convolution neural network. The dataset contains 600 documented images belonging to 4 different categories. Classification of textual feature extraction methods has given an average accuracy of 85.5% while visual feature classification has given an average accuracy 93%.
title Document Image Classification using Visual and Textual Features
topic Classification algorithm; Term frequency; Inverse term frequency; Deep learning.
url https://doi.org/10.5281/zenodo.18515979