Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cumba-Armijos, Cumba-Armijos, Riofrío-Luzcando, Diego, RODRIGUEZ ARBOLEDA, VERONICA ELIZABETH, Carrión Jumbo, Joe
Format:	Recurso digital
Language:
Published:	Zenodo 2022
Online Access:	https://doi.org/10.5281/zenodo.18466670
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866902205429186560
author	Cumba-Armijos, Cumba-Armijos Riofrío-Luzcando, Diego RODRIGUEZ ARBOLEDA, VERONICA ELIZABETH Carrión Jumbo, Joe
author_facet	Cumba-Armijos, Cumba-Armijos Riofrío-Luzcando, Diego RODRIGUEZ ARBOLEDA, VERONICA ELIZABETH Carrión Jumbo, Joe
contents	<p>This dataset contains a Spanish-language Twitter corpus labeled for <strong>binary cyberbullying detection</strong>. It was collected using the Twitter API with Spanish language filtering and a geographic focus on Ecuador, and then <strong>manually annotated </strong>to support supervised learning experiments in hate speech / bullying detection and related NLP tasks.</p> <p>The dataset is provided as a single semicolon-separated CSV file (<code>CorpusBullying.csv</code>) with three fields: a unique tweet identifier (<code>ID</code>), the cleaned tweet text (<code>SpanishTweet</code>), and a binary label (<code>Label</code>), where <strong>1 indicates bullying/cyberbullying content</strong> (e.g., insults, severe verbal aggression, discriminatory attacks) and <strong>0 indicates non-bullying</strong>. The tweet text distributed in the file is preprocessed (lowercased and cleaned by removing links, user mentions, special characters, and Spanish stop words) to facilitate direct use in machine learning pipelines.</p> <p>The corpus includes <strong>83,400</strong> labeled tweets, with <strong>16,247</strong> bullying instances and <strong>67,153</strong> non-bullying instances. It can be used to benchmark text classification models (e.g., CNN/RNN/Transformer architectures), study class imbalance strategies, and compare feature-based and deep learning approaches for cyberbullying detection in Spanish.</p>
format	Recurso digital
id	zenodo_https___doi_org_10_5281_zenodo_18466670
institution	Zenodo
language
publishDate	2022
publisher	Zenodo
record_format	zenodo
spellingShingle	A Labeled Spanish Twitter Dataset for Binary Cyberbullying Detection Cumba-Armijos, Cumba-Armijos Riofrío-Luzcando, Diego RODRIGUEZ ARBOLEDA, VERONICA ELIZABETH Carrión Jumbo, Joe <p>This dataset contains a Spanish-language Twitter corpus labeled for <strong>binary cyberbullying detection</strong>. It was collected using the Twitter API with Spanish language filtering and a geographic focus on Ecuador, and then <strong>manually annotated </strong>to support supervised learning experiments in hate speech / bullying detection and related NLP tasks.</p> <p>The dataset is provided as a single semicolon-separated CSV file (<code>CorpusBullying.csv</code>) with three fields: a unique tweet identifier (<code>ID</code>), the cleaned tweet text (<code>SpanishTweet</code>), and a binary label (<code>Label</code>), where <strong>1 indicates bullying/cyberbullying content</strong> (e.g., insults, severe verbal aggression, discriminatory attacks) and <strong>0 indicates non-bullying</strong>. The tweet text distributed in the file is preprocessed (lowercased and cleaned by removing links, user mentions, special characters, and Spanish stop words) to facilitate direct use in machine learning pipelines.</p> <p>The corpus includes <strong>83,400</strong> labeled tweets, with <strong>16,247</strong> bullying instances and <strong>67,153</strong> non-bullying instances. It can be used to benchmark text classification models (e.g., CNN/RNN/Transformer architectures), study class imbalance strategies, and compare feature-based and deep learning approaches for cyberbullying detection in Spanish.</p>
title	A Labeled Spanish Twitter Dataset for Binary Cyberbullying Detection
url	https://doi.org/10.5281/zenodo.18466670

Similar Items