Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Moez, Catherine
Format:	Preprint
Published:	2024
Subjects:	Computation and Language 68T50 I.7
Online Access:	https://arxiv.org/abs/2411.00964
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915002600914944
author	Moez, Catherine
author_facet	Moez, Catherine
contents	With text analysis tools becoming increasingly sophisticated over the last decade, researchers now face a decision of whether to use state-of-the-art models that provide high performance but that can be highly opaque in their operations and computationally intensive to run. The alternative, frequently, is to rely on older, manually crafted textual scoring tools that are transparently and easily applied, but can suffer from limited performance. I present an alternative that combines the strengths of both: lexicons created with minimal researcher inputs from generic (pretrained) word embeddings. Presenting a number of conceptual lexicons produced from FastText and GloVe (6B) vector representations of words, I argue that embedding-based lexicons respond to a need for transparent yet high-performance text measuring tools.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_00964
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Generic Embedding-Based Lexicons for Transparent and Reproducible Text Scoring Moez, Catherine Computation and Language 68T50 I.7 With text analysis tools becoming increasingly sophisticated over the last decade, researchers now face a decision of whether to use state-of-the-art models that provide high performance but that can be highly opaque in their operations and computationally intensive to run. The alternative, frequently, is to rely on older, manually crafted textual scoring tools that are transparently and easily applied, but can suffer from limited performance. I present an alternative that combines the strengths of both: lexicons created with minimal researcher inputs from generic (pretrained) word embeddings. Presenting a number of conceptual lexicons produced from FastText and GloVe (6B) vector representations of words, I argue that embedding-based lexicons respond to a need for transparent yet high-performance text measuring tools.
title	Generic Embedding-Based Lexicons for Transparent and Reproducible Text Scoring
topic	Computation and Language 68T50 I.7
url	https://arxiv.org/abs/2411.00964

Similar Items