Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Yuwen, Qian, Xinyuan, Zhang, Tian-Hao, Gao, Jiaran, Pan, Yuchen, Wang, Xin, Pan, Zhou, Wei, Chen, Wang, Yiming
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.03531
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917187782967296
author	Wang, Yuwen Qian, Xinyuan Zhang, Tian-Hao Gao, Jiaran Pan, Yuchen Wang, Xin Pan, Zhou Wei, Chen Wang, Yiming
author_facet	Wang, Yuwen Qian, Xinyuan Zhang, Tian-Hao Gao, Jiaran Pan, Yuchen Wang, Xin Pan, Zhou Wei, Chen Wang, Yiming
contents	Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised fine-tuning strategies, while yield improvements, remains limited in modeling personalized knowledge and transferring them across tasks robustly. Data and code will be released.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_03531
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models Wang, Yuwen Qian, Xinyuan Zhang, Tian-Hao Gao, Jiaran Pan, Yuchen Wang, Xin Pan, Zhou Wei, Chen Wang, Yiming Computation and Language Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised fine-tuning strategies, while yield improvements, remains limited in modeling personalized knowledge and transferring them across tasks robustly. Data and code will be released.
title	PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2601.03531

Similar Items