Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wauyo, Peter, Bwiza, Dalia, Murara, Alain, Mugume, Edwin, Umuhoza, Eric
Format:	Preprint
Published:	2025
Subjects:	Software Engineering
Online Access:	https://arxiv.org/abs/2510.02165
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916985698254848
author	Wauyo, Peter Bwiza, Dalia Murara, Alain Mugume, Edwin Umuhoza, Eric
author_facet	Wauyo, Peter Bwiza, Dalia Murara, Alain Mugume, Edwin Umuhoza, Eric
contents	This research introduces a multimodal system designed to detect fraud and fare evasion in public transportation by analyzing closed circuit television (CCTV) and audio data. The proposed solution uses the Vision Transformer for Video (ViViT) model for video feature extraction and the Audio Spectrogram Transformer (AST) for audio analysis. The system implements a Tensor Fusion Network (TFN) architecture that explicitly models unimodal and bimodal interactions through a 2-fold Cartesian product. This advanced fusion technique captures complex cross-modal dynamics between visual behaviors (e.g., tailgating,unauthorized access) and audio cues (e.g., fare transaction sounds). The system was trained and tested on a custom dataset, achieving an accuracy of 89.5%, precision of 87.2%, and recall of 84.0% in detecting fraudulent activities, significantly outperforming early fusion baselines and exceeding the 75% recall rates typically reported in state-of-the-art transportation fraud detection systems. Our ablation studies demonstrate that the tensor fusion approach provides a 7.0% improvement in the F1 score and an 8.8% boost in recall compared to traditional concatenation methods. The solution supports real-time detection, enabling public transport operators to reduce revenue loss, improve passenger safety, and ensure operational compliance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_02165
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Towards fairer public transit: Real-time tensor-based multimodal fare evasion and fraud detection Wauyo, Peter Bwiza, Dalia Murara, Alain Mugume, Edwin Umuhoza, Eric Software Engineering This research introduces a multimodal system designed to detect fraud and fare evasion in public transportation by analyzing closed circuit television (CCTV) and audio data. The proposed solution uses the Vision Transformer for Video (ViViT) model for video feature extraction and the Audio Spectrogram Transformer (AST) for audio analysis. The system implements a Tensor Fusion Network (TFN) architecture that explicitly models unimodal and bimodal interactions through a 2-fold Cartesian product. This advanced fusion technique captures complex cross-modal dynamics between visual behaviors (e.g., tailgating,unauthorized access) and audio cues (e.g., fare transaction sounds). The system was trained and tested on a custom dataset, achieving an accuracy of 89.5%, precision of 87.2%, and recall of 84.0% in detecting fraudulent activities, significantly outperforming early fusion baselines and exceeding the 75% recall rates typically reported in state-of-the-art transportation fraud detection systems. Our ablation studies demonstrate that the tensor fusion approach provides a 7.0% improvement in the F1 score and an 8.8% boost in recall compared to traditional concatenation methods. The solution supports real-time detection, enabling public transport operators to reduce revenue loss, improve passenger safety, and ensure operational compliance.
title	Towards fairer public transit: Real-time tensor-based multimodal fare evasion and fraud detection
topic	Software Engineering
url	https://arxiv.org/abs/2510.02165

Similar Items