Saved in:
Bibliographic Details
Main Authors: Mishra, Debangan, Rastogi, Arihant, Negi, Agyeya, Goel, Shashwat, Kumaraguru, Ponnurangam
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.04032
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915530313564160
author Mishra, Debangan
Rastogi, Arihant
Negi, Agyeya
Goel, Shashwat
Kumaraguru, Ponnurangam
author_facet Mishra, Debangan
Rastogi, Arihant
Negi, Agyeya
Goel, Shashwat
Kumaraguru, Ponnurangam
contents How similar are model outputs across languages? In this work, we study this question using a recently proposed model similarity metric $κ_p$ applied to 20 languages and 47 subjects in GlobalMMLU. Our analysis reveals that a model's responses become increasingly consistent across languages as its size and capability grow. Interestingly, models exhibit greater cross-lingual consistency within themselves than agreement with other models prompted in the same language. These results highlight not only the value of $κ_p$ as a practical tool for evaluating multilingual reliability, but also its potential to guide the development of more consistent multilingual systems.
format Preprint
id arxiv_https___arxiv_org_abs_2509_04032
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle What if I ask in \textit{alia lingua}? Measuring Functional Similarity Across Languages
Mishra, Debangan
Rastogi, Arihant
Negi, Agyeya
Goel, Shashwat
Kumaraguru, Ponnurangam
Computation and Language
Machine Learning
How similar are model outputs across languages? In this work, we study this question using a recently proposed model similarity metric $κ_p$ applied to 20 languages and 47 subjects in GlobalMMLU. Our analysis reveals that a model's responses become increasingly consistent across languages as its size and capability grow. Interestingly, models exhibit greater cross-lingual consistency within themselves than agreement with other models prompted in the same language. These results highlight not only the value of $κ_p$ as a practical tool for evaluating multilingual reliability, but also its potential to guide the development of more consistent multilingual systems.
title What if I ask in \textit{alia lingua}? Measuring Functional Similarity Across Languages
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2509.04032