Saved in:
Bibliographic Details
Main Authors: Schwaar, Stefanie, Diez, Franziska, Trebing, Michael, Witznick, Nils
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.09111
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913789808476160
author Schwaar, Stefanie
Diez, Franziska
Trebing, Michael
Witznick, Nils
author_facet Schwaar, Stefanie
Diez, Franziska
Trebing, Michael
Witznick, Nils
contents In German public administration, there are 45 different offices to which incoming messages need to be distributed. Since these messages are often unstructured, the system has to be based at least partly on message content. For public service no data are given so far and no pretrained model is available. The data we used are conducted by Governikus KG and are of highly different length. To handle those data with standard methods different approaches are known, like normalization or segmentation. However, text classification is highly dependent on the data structure, a study for public administration data is missing at the moment. We conducted such a study analyzing different techniques of classification based on segments, normalization and feature selection. Thereby, we used different methods, this means neural nets, random forest, logistic regression, SVM classifier and SVAE. The comparison shows for the given public service data a classification accuracy of above 80\% can be reached based on cross validation. We further show that normalization is preferable, while the difference to the segmentation approach depends mainly on the choice of algorithm.
format Preprint
id arxiv_https___arxiv_org_abs_2504_09111
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Study on Text Classification for Public Administration
Schwaar, Stefanie
Diez, Franziska
Trebing, Michael
Witznick, Nils
Applications
In German public administration, there are 45 different offices to which incoming messages need to be distributed. Since these messages are often unstructured, the system has to be based at least partly on message content. For public service no data are given so far and no pretrained model is available. The data we used are conducted by Governikus KG and are of highly different length. To handle those data with standard methods different approaches are known, like normalization or segmentation. However, text classification is highly dependent on the data structure, a study for public administration data is missing at the moment. We conducted such a study analyzing different techniques of classification based on segments, normalization and feature selection. Thereby, we used different methods, this means neural nets, random forest, logistic regression, SVM classifier and SVAE. The comparison shows for the given public service data a classification accuracy of above 80\% can be reached based on cross validation. We further show that normalization is preferable, while the difference to the segmentation approach depends mainly on the choice of algorithm.
title Study on Text Classification for Public Administration
topic Applications
url https://arxiv.org/abs/2504.09111