Saved in:
Bibliographic Details
Main Authors: Lin, Yi-Cheng, Chen, Yu-Hua, Dong, Jia-Kai, Huang, Yueh-Hsuan, Chen, Szu-Chi, Chen, Yu-Chen, Chen, Chih-Yao, Lin, Yu-Jung, Chen, Yu-Ling, Chen, Zih-Yu, Tsai, I-Ning, Wang, Hsiu-Hsuan, Chung, Ho-Lam, Lu, Ke-Han, Lee, Hung-yi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.26329
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909816748769280
author Lin, Yi-Cheng
Chen, Yu-Hua
Dong, Jia-Kai
Huang, Yueh-Hsuan
Chen, Szu-Chi
Chen, Yu-Chen
Chen, Chih-Yao
Lin, Yu-Jung
Chen, Yu-Ling
Chen, Zih-Yu
Tsai, I-Ning
Wang, Hsiu-Hsuan
Chung, Ho-Lam
Lu, Ke-Han
Lee, Hung-yi
author_facet Lin, Yi-Cheng
Chen, Yu-Hua
Dong, Jia-Kai
Huang, Yueh-Hsuan
Chen, Szu-Chi
Chen, Yu-Chen
Chen, Chih-Yao
Lin, Yu-Jung
Chen, Yu-Ling
Chen, Zih-Yu
Tsai, I-Ning
Wang, Hsiu-Hsuan
Chung, Ho-Lam
Lu, Ke-Han
Lee, Hung-yi
contents Large audio-language models are advancing rapidly, yet most evaluations emphasize speech or globally sourced sounds, overlooking culturally distinctive cues. This gap raises a critical question: can current models generalize to localized, non-semantic audio that communities instantly recognize but outsiders do not? To address this, we present TAU (Taiwan Audio Understanding), a benchmark of everyday Taiwanese "soundmarks." TAU is built through a pipeline combining curated sources, human editing, and LLM-assisted question generation, producing 702 clips and 1,794 multiple-choice items that cannot be solved by transcripts alone. Experiments show that state-of-the-art LALMs, including Gemini 2.5 and Qwen2-Audio, perform far below local humans. TAU demonstrates the need for localized benchmarks to reveal cultural blind spots, guide more equitable multimodal evaluation, and ensure models serve communities beyond the global mainstream.
format Preprint
id arxiv_https___arxiv_org_abs_2509_26329
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
Lin, Yi-Cheng
Chen, Yu-Hua
Dong, Jia-Kai
Huang, Yueh-Hsuan
Chen, Szu-Chi
Chen, Yu-Chen
Chen, Chih-Yao
Lin, Yu-Jung
Chen, Yu-Ling
Chen, Zih-Yu
Tsai, I-Ning
Wang, Hsiu-Hsuan
Chung, Ho-Lam
Lu, Ke-Han
Lee, Hung-yi
Audio and Speech Processing
Computation and Language
Machine Learning
Sound
Large audio-language models are advancing rapidly, yet most evaluations emphasize speech or globally sourced sounds, overlooking culturally distinctive cues. This gap raises a critical question: can current models generalize to localized, non-semantic audio that communities instantly recognize but outsiders do not? To address this, we present TAU (Taiwan Audio Understanding), a benchmark of everyday Taiwanese "soundmarks." TAU is built through a pipeline combining curated sources, human editing, and LLM-assisted question generation, producing 702 clips and 1,794 multiple-choice items that cannot be solved by transcripts alone. Experiments show that state-of-the-art LALMs, including Gemini 2.5 and Qwen2-Audio, perform far below local humans. TAU demonstrates the need for localized benchmarks to reveal cultural blind spots, guide more equitable multimodal evaluation, and ensure models serve communities beyond the global mainstream.
title TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
topic Audio and Speech Processing
Computation and Language
Machine Learning
Sound
url https://arxiv.org/abs/2509.26329