Saved in:
| Main Authors: | Heigold, Georg, Variani, Ehsan, Bagby, Tom, Allauzen, Cyril, Ma, Ji, Kumar, Shankar, Riley, Michael |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.07143 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB)
by: Allauzen, Cyril, et al.
Published: (2026)
by: Allauzen, Cyril, et al.
Published: (2026)
MAEB: Massive Audio Embedding Benchmark
by: Assadi, Adnan El, et al.
Published: (2026)
by: Assadi, Adnan El, et al.
Published: (2026)
Cross-lingual, Character-Level Neural Morphological Tagging
by: Cotterell, Ryan, et al.
Published: (2017)
by: Cotterell, Ryan, et al.
Published: (2017)
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
by: Huang, W. Ronny, et al.
Published: (2024)
by: Huang, W. Ronny, et al.
Published: (2024)
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)
by: Wang, He, et al.
Published: (2025)
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
by: Fan, Ruchao, et al.
Published: (2024)
by: Fan, Ruchao, et al.
Published: (2024)
Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR
by: Wang, Zilai, et al.
Published: (2026)
by: Wang, Zilai, et al.
Published: (2026)
Leveraging Zipformer Model for Effective Language Identification in Code-Switched Child-Directed Speech
by: Shankar, Lavanya, et al.
Published: (2025)
by: Shankar, Lavanya, et al.
Published: (2025)
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
by: Sakshi, S, et al.
Published: (2024)
by: Sakshi, S, et al.
Published: (2024)
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights
by: Yang, Hao, et al.
Published: (2024)
by: Yang, Hao, et al.
Published: (2024)
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning
by: Jang, Min, et al.
Published: (2026)
by: Jang, Min, et al.
Published: (2026)
Tutorial: $φ$-Transductions in OpenFst via the Gallic Semiring
by: Cognetta, Marco, et al.
Published: (2025)
by: Cognetta, Marco, et al.
Published: (2025)
A* shortest string decoding for non-idempotent semirings
by: Gorman, Kyle, et al.
Published: (2022)
by: Gorman, Kyle, et al.
Published: (2022)
WavLink: Compact Audio-Text Embeddings with a Global Whisper Token
by: Kumar, Gokul Karthik, et al.
Published: (2026)
by: Kumar, Gokul Karthik, et al.
Published: (2026)
Noise Supervised Contrastive Learning and Feature-Perturbed for Anomalous Sound Detection
by: Huang, Shun, et al.
Published: (2025)
by: Huang, Shun, et al.
Published: (2025)
Are Sounds Sound for Phylogenetic Reconstruction?
by: Häuser, Luise, et al.
Published: (2024)
by: Häuser, Luise, et al.
Published: (2024)
MIDI-to-Tab: Guitar Tablature Inference via Masked Language Modeling
by: Edwards, Drew, et al.
Published: (2024)
by: Edwards, Drew, et al.
Published: (2024)
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Lombard Speech Synthesis for Any Voice with Controllable Style Embeddings
by: Akti, Seymanur, et al.
Published: (2026)
by: Akti, Seymanur, et al.
Published: (2026)
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)
by: Su, Yuchen, et al.
Published: (2026)
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval
by: Yoo, HaeJun, et al.
Published: (2026)
by: Yoo, HaeJun, et al.
Published: (2026)
Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices
by: Velikovich, Leonid, et al.
Published: (2024)
by: Velikovich, Leonid, et al.
Published: (2024)
PRiSM: Benchmarking Phone Realization in Speech Models
by: Bharadwaj, Shikhar, et al.
Published: (2026)
by: Bharadwaj, Shikhar, et al.
Published: (2026)
TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2026)
by: Tseng, Liang-Hsuan, et al.
Published: (2026)
BRACE: A Benchmark for Robust Audio Caption Quality Evaluation
by: Guo, Tianyu, et al.
Published: (2025)
by: Guo, Tianyu, et al.
Published: (2025)
SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
by: Baali, Massa, et al.
Published: (2025)
by: Baali, Massa, et al.
Published: (2025)
DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model
by: Baali, Massa, et al.
Published: (2025)
by: Baali, Massa, et al.
Published: (2025)
Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark
by: Turetzky, Arnon, et al.
Published: (2026)
by: Turetzky, Arnon, et al.
Published: (2026)
PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech
by: Menta, Venkata Pushpak Teja
Published: (2026)
by: Menta, Venkata Pushpak Teja
Published: (2026)
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
by: Kalahroodi, Mohammad Javad Ranjbar, et al.
Published: (2026)
by: Kalahroodi, Mohammad Javad Ranjbar, et al.
Published: (2026)
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
by: Xuan, Xi, et al.
Published: (2025)
by: Xuan, Xi, et al.
Published: (2025)
A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026)
by: Kopparapu, Sunil Kumar
Published: (2026)
Do Music Preferences Reflect Cultural Values? A Cross-National Analysis Using Music Embedding and World Values Survey
by: Kim, Yongjae, et al.
Published: (2025)
by: Kim, Yongjae, et al.
Published: (2025)
HumMusQA: A Human-written Music Understanding QA Benchmark Dataset
by: Weck, Benno, et al.
Published: (2026)
by: Weck, Benno, et al.
Published: (2026)
PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech
by: Piskala, Deepak Babu
Published: (2025)
by: Piskala, Deepak Babu
Published: (2025)
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
by: Hu, Jiliang, et al.
Published: (2025)
by: Hu, Jiliang, et al.
Published: (2025)
Selective Attention Merging for low resource tasks: A case study of Child ASR
by: Shankar, Natarajan Balaji, et al.
Published: (2025)
by: Shankar, Natarajan Balaji, et al.
Published: (2025)
Similar Items
-
Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB)
by: Allauzen, Cyril, et al.
Published: (2026) -
MAEB: Massive Audio Embedding Benchmark
by: Assadi, Adnan El, et al.
Published: (2026) -
Cross-lingual, Character-Level Neural Morphological Tagging
by: Cotterell, Ryan, et al.
Published: (2017) -
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
by: Huang, W. Ronny, et al.
Published: (2024) -
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)