Saved in:
| Main Authors: | Feng, Tiantian, Avramidis, Kleanthis, Xu, Anfeng, Wang, Deqi, Booth, Brandon M, Narayanan, Shrikanth |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.10888 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VoxCog: Towards End-to-End Multilingual Cognitive Impairment Classification through Dialectal Knowledge
by: Feng, Tiantian, et al.
Published: (2026)
by: Feng, Tiantian, et al.
Published: (2026)
Emotion-Aligned Contrastive Learning Between Images and Music
by: Stewart, Shanti, et al.
Published: (2023)
by: Stewart, Shanti, et al.
Published: (2023)
Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)
by: Xu, Anfeng, et al.
Published: (2025)
Audio-visual child-adult speaker classification in dyadic interactions
by: Xu, Anfeng, et al.
Published: (2023)
by: Xu, Anfeng, et al.
Published: (2023)
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)
by: Lee, Jihwan, et al.
Published: (2024)
ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood
by: Feng, Tiantian, et al.
Published: (2026)
by: Feng, Tiantian, et al.
Published: (2026)
End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
by: Feng, Tiantian, et al.
Published: (2023)
by: Feng, Tiantian, et al.
Published: (2023)
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
by: Feng, Tiantian, et al.
Published: (2025)
by: Feng, Tiantian, et al.
Published: (2025)
ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
Affect Decoding in Phonated and Silent Speech Production from Surface EMG
by: Pistrosch, Simon, et al.
Published: (2026)
by: Pistrosch, Simon, et al.
Published: (2026)
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
Towards Interpretable Framework for Neural Audio Codecs via Sparse Autoencoders: A Case Study on Accent Information
by: Wang, Shih-Heng, et al.
Published: (2026)
by: Wang, Shih-Heng, et al.
Published: (2026)
Examining Test-Time Adaptation for Personalized Child Speech Recognition
by: Shi, Zhonghao, et al.
Published: (2024)
by: Shi, Zhonghao, et al.
Published: (2024)
Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices
by: Feng, Tiantian, et al.
Published: (2025)
by: Feng, Tiantian, et al.
Published: (2025)
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
by: Lin, Zhaojiang, et al.
Published: (2025)
by: Lin, Zhaojiang, et al.
Published: (2025)
Early Detection of Coffee Leaf Rust Through Convolutional Neural Networks Trained on Low-Resolution Images
by: Cabrera, Angelly, et al.
Published: (2024)
by: Cabrera, Angelly, et al.
Published: (2024)
VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)
by: Zhang, Hezhao, et al.
Published: (2026)
Articulatory Feature Prediction from Surface EMG during Speech Production
by: Lee, Jihwan, et al.
Published: (2025)
by: Lee, Jihwan, et al.
Published: (2025)
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe
by: Feng, Tiantian, et al.
Published: (2025)
by: Feng, Tiantian, et al.
Published: (2025)
Trade-offs Between Capacity and Robustness in Neural Audio Codecs for Adversarially Robust Speech Recognition
by: Prescott, Jordan, et al.
Published: (2026)
by: Prescott, Jordan, et al.
Published: (2026)
Exploring Speech Foundation Models for Speaker Diarization Across Lifespan
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
Who Said What WSW 2.0? Enhanced Automated Analysis of Preschool Classroom Speech
by: Sun, Anchen, et al.
Published: (2025)
by: Sun, Anchen, et al.
Published: (2025)
Understanding Stress, Burnout, and Behavioral Patterns in Medical Residents Using Large-scale Longitudinal Wearable Recordings
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
TILES-2018 Sleep Benchmark Dataset: A Longitudinal Wearable Sleep Data Set of Hospital Workers for Modeling and Understanding Sleep Behaviors
by: Feng, Tiantian, et al.
Published: (2025)
by: Feng, Tiantian, et al.
Published: (2025)
Knowledge-guided EEG Representation Learning
by: Kommineni, Aditya, et al.
Published: (2024)
by: Kommineni, Aditya, et al.
Published: (2024)
Evaluating Atypical Gaze Patterns through Vision Models: The Case of Cortical Visual Impairment
by: Avramidis, Kleanthis, et al.
Published: (2024)
by: Avramidis, Kleanthis, et al.
Published: (2024)
Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of the Vocal Tract during Speech
by: Nguyen, Hong, et al.
Published: (2024)
by: Nguyen, Hong, et al.
Published: (2024)
Encoding Emotion Through Self-Supervised Eye Movement Reconstruction
by: Ma, Marcus, et al.
Published: (2026)
by: Ma, Marcus, et al.
Published: (2026)
Neural Codecs as Biosignal Tokenizers
by: Avramidis, Kleanthis, et al.
Published: (2025)
by: Avramidis, Kleanthis, et al.
Published: (2025)
Phone Duration Modeling for Speaker Age Estimation in Children
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2021)
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2021)
The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data
by: Baird, Alice, et al.
Published: (2024)
by: Baird, Alice, et al.
Published: (2024)
Looking Into the Past: Eye Movements Characterize Elements of Autobiographical Recall in Interviews with Holocaust Survivors
by: Zhou, Emily, et al.
Published: (2026)
by: Zhou, Emily, et al.
Published: (2026)
voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
by: Justus, Aju Ani, et al.
Published: (2026)
by: Justus, Aju Ani, et al.
Published: (2026)
Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox
by: Pang, Jiacheng, et al.
Published: (2026)
by: Pang, Jiacheng, et al.
Published: (2026)
A long-form single-speaker real-time MRI speech dataset and benchmark
by: Foley, Sean, et al.
Published: (2025)
by: Foley, Sean, et al.
Published: (2025)
Informed Bootstrap Augmentation Improves EEG Decoding
by: Jeong, Woojae, et al.
Published: (2025)
by: Jeong, Woojae, et al.
Published: (2025)
VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks
by: Tsaprazlis, Efthymios, et al.
Published: (2025)
by: Tsaprazlis, Efthymios, et al.
Published: (2025)
Similar Items
-
VoxCog: Towards End-to-End Multilingual Cognitive Impairment Classification through Dialectal Knowledge
by: Feng, Tiantian, et al.
Published: (2026) -
Emotion-Aligned Contrastive Learning Between Images and Music
by: Stewart, Shanti, et al.
Published: (2023) -
Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
by: Feng, Tiantian, et al.
Published: (2024) -
Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025) -
Audio-visual child-adult speaker classification in dyadic interactions
by: Xu, Anfeng, et al.
Published: (2023)