Saved in:
Bibliographic Details
Main Authors: Mandia, Sandeep, Singh, Kuldeep, Mitharwal, Rajendra, Mushtaq, Faisel, Janu, Dimpal
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.10813
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917924415995904
author Mandia, Sandeep
Singh, Kuldeep
Mitharwal, Rajendra
Mushtaq, Faisel
Janu, Dimpal
author_facet Mandia, Sandeep
Singh, Kuldeep
Mitharwal, Rajendra
Mushtaq, Faisel
Janu, Dimpal
contents The COVID-19 pandemic and the internet's availability have recently boosted online learning. However, monitoring engagement in online learning is a difficult task for teachers. In this context, timely automatic student engagement classification can help teachers in making adaptive adjustments to meet students' needs. This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification. The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders; the global encoder then processes the representation from each encoder, and finally, multi layer perceptron (MLP) predicts the engagement level. A learning centered affective state dataset is curated from existing open source databases. The proposed method achieved an accuracy of 63.9%, 56.73%, 99.16%, 65.67%, and 74.89% on Dataset for Affective States in E-Environments (DAiSEE), Bahcesehir University Multimodal Affective Database-1 (BAUM-1), Yawning Detection Dataset (YawDD), University of Texas at Arlington Real-Life Drowsiness Dataset (UTA-RLDD), and curated learning-centered affective state dataset respectively. The achieved results on the BAUM-1, DAiSEE, and YawDD datasets demonstrate state-of-the-art performance, indicating the superiority of the proposed model in accurately classifying affective states on these datasets. Additionally, the results obtained on the UTA-RLDD dataset, which involves two-class classification, serve as a baseline for future research. These results provide a foundation for further investigations and serve as a point of reference for future works to compare and improve upon.
format Preprint
id arxiv_https___arxiv_org_abs_2502_10813
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning
Mandia, Sandeep
Singh, Kuldeep
Mitharwal, Rajendra
Mushtaq, Faisel
Janu, Dimpal
Computer Vision and Pattern Recognition
The COVID-19 pandemic and the internet's availability have recently boosted online learning. However, monitoring engagement in online learning is a difficult task for teachers. In this context, timely automatic student engagement classification can help teachers in making adaptive adjustments to meet students' needs. This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification. The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders; the global encoder then processes the representation from each encoder, and finally, multi layer perceptron (MLP) predicts the engagement level. A learning centered affective state dataset is curated from existing open source databases. The proposed method achieved an accuracy of 63.9%, 56.73%, 99.16%, 65.67%, and 74.89% on Dataset for Affective States in E-Environments (DAiSEE), Bahcesehir University Multimodal Affective Database-1 (BAUM-1), Yawning Detection Dataset (YawDD), University of Texas at Arlington Real-Life Drowsiness Dataset (UTA-RLDD), and curated learning-centered affective state dataset respectively. The achieved results on the BAUM-1, DAiSEE, and YawDD datasets demonstrate state-of-the-art performance, indicating the superiority of the proposed model in accurately classifying affective states on these datasets. Additionally, the results obtained on the UTA-RLDD dataset, which involves two-class classification, serve as a baseline for future research. These results provide a foundation for further investigations and serve as a point of reference for future works to compare and improve upon.
title Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2502.10813