Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Liu, Tairan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Applications
Online Access:	https://arxiv.org/abs/2506.02425
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916774965936128
author	Liu, Tairan
author_facet	Liu, Tairan
contents	Textbooks play a critical role in shaping children's understanding of the world. While previous studies have identified gender inequality in individual countries' textbooks, few have examined the issue cross-culturally. This study applies natural language processing methods to quantify gender inequality in English textbooks from 22 countries across 7 cultural spheres. Metrics include character count, firstness (which gender is mentioned first), and TF-IDF word associations by gender. The analysis also identifies gender patterns in proper names appearing in TF-IDF word lists, tests whether large language models can distinguish between gendered word lists, and uses GloVe embeddings to examine how closely keywords associate with each gender. Results show consistent overrepresentation of male characters in terms of count, firstness, and named entities. All regions exhibit gender inequality, with the Latin cultural sphere showing the least disparity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_02425
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Gender Inequality in English Textbooks Around the World: an NLP Approach Liu, Tairan Computation and Language Applications Textbooks play a critical role in shaping children's understanding of the world. While previous studies have identified gender inequality in individual countries' textbooks, few have examined the issue cross-culturally. This study applies natural language processing methods to quantify gender inequality in English textbooks from 22 countries across 7 cultural spheres. Metrics include character count, firstness (which gender is mentioned first), and TF-IDF word associations by gender. The analysis also identifies gender patterns in proper names appearing in TF-IDF word lists, tests whether large language models can distinguish between gendered word lists, and uses GloVe embeddings to examine how closely keywords associate with each gender. Results show consistent overrepresentation of male characters in terms of count, firstness, and named entities. All regions exhibit gender inequality, with the Latin cultural sphere showing the least disparity.
title	Gender Inequality in English Textbooks Around the World: an NLP Approach
topic	Computation and Language Applications
url	https://arxiv.org/abs/2506.02425

Similar Items