Saved in:
Bibliographic Details
Main Author: Moez, Catherine
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2411.00964
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915002600914944
author Moez, Catherine
author_facet Moez, Catherine
contents With text analysis tools becoming increasingly sophisticated over the last decade, researchers now face a decision of whether to use state-of-the-art models that provide high performance but that can be highly opaque in their operations and computationally intensive to run. The alternative, frequently, is to rely on older, manually crafted textual scoring tools that are transparently and easily applied, but can suffer from limited performance. I present an alternative that combines the strengths of both: lexicons created with minimal researcher inputs from generic (pretrained) word embeddings. Presenting a number of conceptual lexicons produced from FastText and GloVe (6B) vector representations of words, I argue that embedding-based lexicons respond to a need for transparent yet high-performance text measuring tools.
format Preprint
id arxiv_https___arxiv_org_abs_2411_00964
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Generic Embedding-Based Lexicons for Transparent and Reproducible Text Scoring
Moez, Catherine
Computation and Language
68T50
I.7
With text analysis tools becoming increasingly sophisticated over the last decade, researchers now face a decision of whether to use state-of-the-art models that provide high performance but that can be highly opaque in their operations and computationally intensive to run. The alternative, frequently, is to rely on older, manually crafted textual scoring tools that are transparently and easily applied, but can suffer from limited performance. I present an alternative that combines the strengths of both: lexicons created with minimal researcher inputs from generic (pretrained) word embeddings. Presenting a number of conceptual lexicons produced from FastText and GloVe (6B) vector representations of words, I argue that embedding-based lexicons respond to a need for transparent yet high-performance text measuring tools.
title Generic Embedding-Based Lexicons for Transparent and Reproducible Text Scoring
topic Computation and Language
68T50
I.7
url https://arxiv.org/abs/2411.00964