Saved in:
Bibliographic Details
Main Authors: Wang, Yuwen, Qian, Xinyuan, Zhang, Tian-Hao, Gao, Jiaran, Pan, Yuchen, Wang, Xin, Pan, Zhou, Wei, Chen, Wang, Yiming
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.03531
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917187782967296
author Wang, Yuwen
Qian, Xinyuan
Zhang, Tian-Hao
Gao, Jiaran
Pan, Yuchen
Wang, Xin
Pan, Zhou
Wei, Chen
Wang, Yiming
author_facet Wang, Yuwen
Qian, Xinyuan
Zhang, Tian-Hao
Gao, Jiaran
Pan, Yuchen
Wang, Xin
Pan, Zhou
Wei, Chen
Wang, Yiming
contents Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised fine-tuning strategies, while yield improvements, remains limited in modeling personalized knowledge and transferring them across tasks robustly. Data and code will be released.
format Preprint
id arxiv_https___arxiv_org_abs_2601_03531
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models
Wang, Yuwen
Qian, Xinyuan
Zhang, Tian-Hao
Gao, Jiaran
Pan, Yuchen
Wang, Xin
Pan, Zhou
Wei, Chen
Wang, Yiming
Computation and Language
Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised fine-tuning strategies, while yield improvements, remains limited in modeling personalized knowledge and transferring them across tasks robustly. Data and code will be released.
title PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models
topic Computation and Language
url https://arxiv.org/abs/2601.03531