Оглавление: :: Library Catalog

Сохранить в:

Библиографические подробности
Главный автор:	HIDAKA, Shinsuke
Формат:	Recurso digital
Язык:	киргизский
Опубликовано:	Zenodo 2026
Предметы:	Turkic languages discourse corpus ELAN Central asia corpus linguistics
Online-ссылка:	https://doi.org/10.5281/zenodo.19327769
Метки:	Добавить метку Нет меток, Требуется 1-ая метка записи!

Оглавление:

The Central Asian Turkic Discourse Corpus (CATDiC) is a pilot corpus of discourse data from five Turkic languages: - Uzbek   - Kazakh   - Turkmen   - Kyrgyz   - Uyghur   Each dataset consists of approximately 2 minutes and 15 seconds of annotated dialogue. The corpus includes: - Video recordings (.mp4) - Audio recordings (.wav) - ELAN annotation files (.eaf) - Structured metadata (metadata.csv, speakers.csv) However, Turkmen data is excluded due to consent restrictions. ## Annotation The corpus is annotated using ELAN with multiple tiers. Each tier name consists of an annotation label and a speaker identifier, separated by the "@" symbol.   The part before "@" indicates the type of annotation, and the part after "@" indicates the speaker ID. Speaker identifiers vary across recordings and are anonymized.   Detailed speaker information is provided in the metadata. The annotation layers include: - transcription - Japanese translation - segmentation - morpheme-level annotation - glossing - discourse annotation (DA_dim, DA_func) - annotation indices ## Notes This is a pilot dataset.   Some annotation layers may be incomplete or exploratory. Sensitive information has been anonymized.   Proper names in the recordings have been masked in both audio and transcription where necessary. ## Usage CATDiC is intended for: - Discourse analysis - Turkic linguistics - Comparative studies - Annotation research ## License This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. ## Data Format Data files are provided as compressed archives (.zip).   Please extract them before use. ##Note Metadata files (metadata.csv, speakers.csv) have been added in this version.

Схожие документы