Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mandia, Sandeep, Singh, Kuldeep, Mitharwal, Rajendra, Mushtaq, Faisel, Janu, Dimpal
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.10813
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917924415995904
author	Mandia, Sandeep Singh, Kuldeep Mitharwal, Rajendra Mushtaq, Faisel Janu, Dimpal
author_facet	Mandia, Sandeep Singh, Kuldeep Mitharwal, Rajendra Mushtaq, Faisel Janu, Dimpal
contents	The COVID-19 pandemic and the internet's availability have recently boosted online learning. However, monitoring engagement in online learning is a difficult task for teachers. In this context, timely automatic student engagement classification can help teachers in making adaptive adjustments to meet students' needs. This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification. The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders; the global encoder then processes the representation from each encoder, and finally, multi layer perceptron (MLP) predicts the engagement level. A learning centered affective state dataset is curated from existing open source databases. The proposed method achieved an accuracy of 63.9%, 56.73%, 99.16%, 65.67%, and 74.89% on Dataset for Affective States in E-Environments (DAiSEE), Bahcesehir University Multimodal Affective Database-1 (BAUM-1), Yawning Detection Dataset (YawDD), University of Texas at Arlington Real-Life Drowsiness Dataset (UTA-RLDD), and curated learning-centered affective state dataset respectively. The achieved results on the BAUM-1, DAiSEE, and YawDD datasets demonstrate state-of-the-art performance, indicating the superiority of the proposed model in accurately classifying affective states on these datasets. Additionally, the results obtained on the UTA-RLDD dataset, which involves two-class classification, serve as a baseline for future research. These results provide a foundation for further investigations and serve as a point of reference for future works to compare and improve upon.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_10813
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning Mandia, Sandeep Singh, Kuldeep Mitharwal, Rajendra Mushtaq, Faisel Janu, Dimpal Computer Vision and Pattern Recognition The COVID-19 pandemic and the internet's availability have recently boosted online learning. However, monitoring engagement in online learning is a difficult task for teachers. In this context, timely automatic student engagement classification can help teachers in making adaptive adjustments to meet students' needs. This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification. The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders; the global encoder then processes the representation from each encoder, and finally, multi layer perceptron (MLP) predicts the engagement level. A learning centered affective state dataset is curated from existing open source databases. The proposed method achieved an accuracy of 63.9%, 56.73%, 99.16%, 65.67%, and 74.89% on Dataset for Affective States in E-Environments (DAiSEE), Bahcesehir University Multimodal Affective Database-1 (BAUM-1), Yawning Detection Dataset (YawDD), University of Texas at Arlington Real-Life Drowsiness Dataset (UTA-RLDD), and curated learning-centered affective state dataset respectively. The achieved results on the BAUM-1, DAiSEE, and YawDD datasets demonstrate state-of-the-art performance, indicating the superiority of the proposed model in accurately classifying affective states on these datasets. Additionally, the results obtained on the UTA-RLDD dataset, which involves two-class classification, serve as a baseline for future research. These results provide a foundation for further investigations and serve as a point of reference for future works to compare and improve upon.
title	Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2502.10813

Similar Items