:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Feng, Tiantian, Avramidis, Kleanthis, Xu, Anfeng, Wang, Deqi, Booth, Brandon M, Narayanan, Shrikanth
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2603.10888
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VoxCog: Towards End-to-End Multilingual Cognitive Impairment Classification through Dialectal Knowledge
by: Feng, Tiantian, et al.
Published: (2026)

Emotion-Aligned Contrastive Learning Between Images and Music
by: Stewart, Shanti, et al.
Published: (2023)

Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
by: Feng, Tiantian, et al.
Published: (2024)

Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)

Audio-visual child-adult speaker classification in dyadic interactions
by: Xu, Anfeng, et al.
Published: (2023)

Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood
by: Feng, Tiantian, et al.
Published: (2026)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
by: Feng, Tiantian, et al.
Published: (2024)

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
by: Feng, Tiantian, et al.
Published: (2023)

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
by: Feng, Tiantian, et al.
Published: (2025)

ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation
by: Feng, Tiantian, et al.
Published: (2024)

Affect Decoding in Phonated and Silent Speech Production from Surface EMG
by: Pistrosch, Simon, et al.
Published: (2026)

TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
by: Feng, Tiantian, et al.
Published: (2024)

Towards Interpretable Framework for Neural Audio Codecs via Sparse Autoencoders: A Case Study on Accent Information
by: Wang, Shih-Heng, et al.
Published: (2026)

Examining Test-Time Adaptation for Personalized Child Speech Recognition
by: Shi, Zhonghao, et al.
Published: (2024)

Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices
by: Feng, Tiantian, et al.
Published: (2025)

WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
by: Lin, Zhaojiang, et al.
Published: (2025)

Early Detection of Coffee Leaf Rust Through Convolutional Neural Networks Trained on Low-Resolution Images
by: Cabrera, Angelly, et al.
Published: (2024)

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)

Articulatory Feature Prediction from Surface EMG during Speech Production
by: Lee, Jihwan, et al.
Published: (2025)

Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
by: Feng, Tiantian, et al.
Published: (2025)

Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition
by: Prescott, Jordan, et al.
Published: (2026)

Exploring Speech Foundation Models for Speaker Diarization Across Lifespan
by: Xu, Anfeng, et al.
Published: (2026)

Who Said What WSW 2.0? Enhanced Automated Analysis of Preschool Classroom Speech
by: Sun, Anchen, et al.
Published: (2025)

Understanding Stress, Burnout, and Behavioral Patterns in Medical Residents Using Large-scale Longitudinal Wearable Recordings
by: Feng, Tiantian, et al.
Published: (2024)

TILES-2018 Sleep Benchmark Dataset: A Longitudinal Wearable Sleep Data Set of Hospital Workers for Modeling and Understanding Sleep Behaviors
by: Feng, Tiantian, et al.
Published: (2025)

Knowledge-guided EEG Representation Learning
by: Kommineni, Aditya, et al.
Published: (2024)

Evaluating Atypical Gaze Patterns through Vision Models: The Case of Cortical Visual Impairment
by: Avramidis, Kleanthis, et al.
Published: (2024)

Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of the Vocal Tract during Speech
by: Nguyen, Hong, et al.
Published: (2024)

Encoding Emotion Through Self-Supervised Eye Movement Reconstruction
by: Ma, Marcus, et al.
Published: (2026)

Neural Codecs as Biosignal Tokenizers
by: Avramidis, Kleanthis, et al.
Published: (2025)

Phone Duration Modeling for Speaker Age Estimation in Children
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2021)

The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data
by: Baird, Alice, et al.
Published: (2024)

Looking Into the Past: Eye Movements Characterize Elements of Autobiographical Recall in Interviews with Holocaust Survivors
by: Zhou, Emily, et al.
Published: (2026)

voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
by: Justus, Aju Ani, et al.
Published: (2026)

Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox
by: Pang, Jiacheng, et al.
Published: (2026)

A long-form single-speaker real-time MRI speech dataset and benchmark
by: Foley, Sean, et al.
Published: (2025)

Informed Bootstrap Augmentation Improves EEG Decoding
by: Jeong, Woojae, et al.
Published: (2025)

VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks
by: Tsaprazlis, Efthymios, et al.
Published: (2025)