Saved in:
Bibliographic Details
Main Authors: Saldana, Daniella Alexandra Crysti Vargas, Cueva, Freddy Herrera
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.19659
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We study whether latent motivation signals in short Spanish admission responses predict engagement and performance in an early quantum computing pathway run by QuantumHub Peru. We analyze N=241 applicants' open responses and link them to outcomes from two selective modules: Module 1 (secondary; mathematics and computing foundations; n=23) and Module 2 (secondary + early undergraduate; quantum fundamentals; n=36, including M1 continuers). To ensure baseline comparability, the M2 university entrance exam matched the difficulty of the M1 final. Final grades followed the program's official cohort-specific weightings (attendance/assignments/exam), which we retain to preserve ecological validity. Methodologically, we model text with Latent Dirichlet Allocation (LDA, k=8) and, for robustness, with sentence embeddings from a small multilingual language model, EmbeddingGemma-300M, projected via UMAP and clustered with HDBSCAN. This combination leverages the transparency of bag-of-words topics and the semantic richness of small language model embeddings. Descriptively, curiosity/learning topics show higher grades and attendance than technology/career-oriented topics; inferential tests are underpowered (e.g., linear R2 ~ 0.03; logistic pseudo-R2 ~ 0.04) so effect-size estimates should be viewed as preliminary rather than confirmatory. Embedding-based clustering yields seven clusters with 11.2% noise and modest agreement with LDA (ARI=0.068; NMI=0.163). Results suggest that brief motivation responses encode promising signals that could support early mentoring in rigorous STEM pipelines, while highlighting the need for larger, pre-registered studies.