Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chung, Sung Kyun, Dong, Jiaheng, Hu, Qiuchi, Huang, Gongping, Jia, Hong, Dang, Ting
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.14343
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914395239481344
author	Chung, Sung Kyun Dong, Jiaheng Hu, Qiuchi Huang, Gongping Jia, Hong Dang, Ting
author_facet	Chung, Sung Kyun Dong, Jiaheng Hu, Qiuchi Huang, Gongping Jia, Hong Dang, Ting
contents	Large Audio-Language Models (LALMs) have shown strong performance in speech understanding, making speech a natural interface for accessing factual information. Yet they are trained on static corpora and may encode incorrect facts. Existing model editing methods localize and update facts in text-only LLMs, but do not account for continuous speech representations, or where knowledge is stored across acoustic or language modules, or their cross-modal module. We construct the first audio benchmark for knowledge localization and editing in LALMs and propose a speech-driven locate-then-edit framework. First, we use speech-aware causal tracing to localize layers and modules that support factual retrieval and then apply editing at identified sites. Experiments show that factual knowledge is jointly encoded in audio and text modules, and that audio editing yields more effective updates than text editing or fine-tuning, enabling fine-grained knowledge control in speech AI systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_14343
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Localizing and Editing Knowledge in Large Audio-Language Models Chung, Sung Kyun Dong, Jiaheng Hu, Qiuchi Huang, Gongping Jia, Hong Dang, Ting Machine Learning Large Audio-Language Models (LALMs) have shown strong performance in speech understanding, making speech a natural interface for accessing factual information. Yet they are trained on static corpora and may encode incorrect facts. Existing model editing methods localize and update facts in text-only LLMs, but do not account for continuous speech representations, or where knowledge is stored across acoustic or language modules, or their cross-modal module. We construct the first audio benchmark for knowledge localization and editing in LALMs and propose a speech-driven locate-then-edit framework. First, we use speech-aware causal tracing to localize layers and modules that support factual retrieval and then apply editing at identified sites. Experiments show that factual knowledge is jointly encoded in audio and text modules, and that audio editing yields more effective updates than text editing or fine-tuning, enabling fine-grained knowledge control in speech AI systems.
title	Localizing and Editing Knowledge in Large Audio-Language Models
topic	Machine Learning
url	https://arxiv.org/abs/2603.14343

Similar Items