Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.09341 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909425477877760 |
|---|---|
| author | Uthayasooriyar, Benno Ly, Antoine Vermet, Franck Corro, Caio |
| author_facet | Uthayasooriyar, Benno Ly, Antoine Vermet, Franck Corro, Caio |
| contents | Generic pre-trained neural networks may struggle to produce good results in specialized domains like finance and insurance. This is due to a domain mismatch between training data and downstream tasks, as in-domain data are often scarce due to privacy constraints. In this work, we compare different pre-training strategies for LayoutLM. We show that using domain-relevant documents improves results on a named-entity recognition (NER) problem using a novel dataset of anonymized insurance-related financial documents called Payslips. Moreover, we show that we can achieve competitive results using a smaller and faster model. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2412_09341 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain Uthayasooriyar, Benno Ly, Antoine Vermet, Franck Corro, Caio Computation and Language Generic pre-trained neural networks may struggle to produce good results in specialized domains like finance and insurance. This is due to a domain mismatch between training data and downstream tasks, as in-domain data are often scarce due to privacy constraints. In this work, we compare different pre-training strategies for LayoutLM. We show that using domain-relevant documents improves results on a named-entity recognition (NER) problem using a novel dataset of anonymized insurance-related financial documents called Payslips. Moreover, we show that we can achieve competitive results using a smaller and faster model. |
| title | Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2412.09341 |