Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Schwaar, Stefanie, Diez, Franziska, Trebing, Michael, Witznick, Nils
Format:	Preprint
Published:	2025
Subjects:	Applications
Online Access:	https://arxiv.org/abs/2504.09111
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913789808476160
author	Schwaar, Stefanie Diez, Franziska Trebing, Michael Witznick, Nils
author_facet	Schwaar, Stefanie Diez, Franziska Trebing, Michael Witznick, Nils
contents	In German public administration, there are 45 different offices to which incoming messages need to be distributed. Since these messages are often unstructured, the system has to be based at least partly on message content. For public service no data are given so far and no pretrained model is available. The data we used are conducted by Governikus KG and are of highly different length. To handle those data with standard methods different approaches are known, like normalization or segmentation. However, text classification is highly dependent on the data structure, a study for public administration data is missing at the moment. We conducted such a study analyzing different techniques of classification based on segments, normalization and feature selection. Thereby, we used different methods, this means neural nets, random forest, logistic regression, SVM classifier and SVAE. The comparison shows for the given public service data a classification accuracy of above 80\% can be reached based on cross validation. We further show that normalization is preferable, while the difference to the segmentation approach depends mainly on the choice of algorithm.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_09111
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Study on Text Classification for Public Administration Schwaar, Stefanie Diez, Franziska Trebing, Michael Witznick, Nils Applications In German public administration, there are 45 different offices to which incoming messages need to be distributed. Since these messages are often unstructured, the system has to be based at least partly on message content. For public service no data are given so far and no pretrained model is available. The data we used are conducted by Governikus KG and are of highly different length. To handle those data with standard methods different approaches are known, like normalization or segmentation. However, text classification is highly dependent on the data structure, a study for public administration data is missing at the moment. We conducted such a study analyzing different techniques of classification based on segments, normalization and feature selection. Thereby, we used different methods, this means neural nets, random forest, logistic regression, SVM classifier and SVAE. The comparison shows for the given public service data a classification accuracy of above 80\% can be reached based on cross validation. We further show that normalization is preferable, while the difference to the segmentation approach depends mainly on the choice of algorithm.
title	Study on Text Classification for Public Administration
topic	Applications
url	https://arxiv.org/abs/2504.09111

Similar Items