Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Carslaw, Iona, Milton, Sivan, Navarre, Nicolas, Qing, Ciyang, Uegaki, Wataru
Formato:	Preprint
Publicado:	2025
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2506.14064
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866913898029907968
author	Carslaw, Iona Milton, Sivan Navarre, Nicolas Qing, Ciyang Uegaki, Wataru
author_facet	Carslaw, Iona Milton, Sivan Navarre, Nicolas Qing, Ciyang Uegaki, Wataru
contents	For linguists, embedded clauses have been of special interest because of their intricate distribution of syntactic and semantic features. Yet, current research relies on schematically created language examples to investigate these constructions, missing out on statistical information and naturally-occurring examples that can be gained from large language corpora. Thus, we present a methodological approach for detecting and annotating naturally-occurring examples of English embedded clauses in large-scale text data using constituency parsing and a set of parsing heuristics. Our tool has been evaluated on our dataset Golden Embedded Clause Set (GECS), which includes hand-annotated examples of naturally-occurring English embedded clause sentences. Finally, we present a large-scale dataset of naturally-occurring English embedded clauses which we have extracted from the open-source corpus Dolma using our extraction tool.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_14064
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Automatic Extraction of Clausal Embedding Based on Large-Scale English Text Data Carslaw, Iona Milton, Sivan Navarre, Nicolas Qing, Ciyang Uegaki, Wataru Computation and Language For linguists, embedded clauses have been of special interest because of their intricate distribution of syntactic and semantic features. Yet, current research relies on schematically created language examples to investigate these constructions, missing out on statistical information and naturally-occurring examples that can be gained from large language corpora. Thus, we present a methodological approach for detecting and annotating naturally-occurring examples of English embedded clauses in large-scale text data using constituency parsing and a set of parsing heuristics. Our tool has been evaluated on our dataset Golden Embedded Clause Set (GECS), which includes hand-annotated examples of naturally-occurring English embedded clause sentences. Finally, we present a large-scale dataset of naturally-occurring English embedded clauses which we have extracted from the open-source corpus Dolma using our extraction tool.
title	Automatic Extraction of Clausal Embedding Based on Large-Scale English Text Data
topic	Computation and Language
url	https://arxiv.org/abs/2506.14064

Ejemplares similares