Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mishra, Debangan, Rastogi, Arihant, Negi, Agyeya, Goel, Shashwat, Kumaraguru, Ponnurangam
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2509.04032
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915530313564160
author	Mishra, Debangan Rastogi, Arihant Negi, Agyeya Goel, Shashwat Kumaraguru, Ponnurangam
author_facet	Mishra, Debangan Rastogi, Arihant Negi, Agyeya Goel, Shashwat Kumaraguru, Ponnurangam
contents	How similar are model outputs across languages? In this work, we study this question using a recently proposed model similarity metric $κ_p$ applied to 20 languages and 47 subjects in GlobalMMLU. Our analysis reveals that a model's responses become increasingly consistent across languages as its size and capability grow. Interestingly, models exhibit greater cross-lingual consistency within themselves than agreement with other models prompted in the same language. These results highlight not only the value of $κ_p$ as a practical tool for evaluating multilingual reliability, but also its potential to guide the development of more consistent multilingual systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_04032
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	What if I ask in \textit{alia lingua}? Measuring Functional Similarity Across Languages Mishra, Debangan Rastogi, Arihant Negi, Agyeya Goel, Shashwat Kumaraguru, Ponnurangam Computation and Language Machine Learning How similar are model outputs across languages? In this work, we study this question using a recently proposed model similarity metric $κ_p$ applied to 20 languages and 47 subjects in GlobalMMLU. Our analysis reveals that a model's responses become increasingly consistent across languages as its size and capability grow. Interestingly, models exhibit greater cross-lingual consistency within themselves than agreement with other models prompted in the same language. These results highlight not only the value of $κ_p$ as a practical tool for evaluating multilingual reliability, but also its potential to guide the development of more consistent multilingual systems.
title	What if I ask in \textit{alia lingua}? Measuring Functional Similarity Across Languages
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2509.04032

Similar Items