Saved in:
Bibliographic Details
Main Authors: Min, Zeping, Wang, Xinshang
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.16650
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915126035087360
author Min, Zeping
Wang, Xinshang
author_facet Min, Zeping
Wang, Xinshang
contents We introduce a novel index, the Distribution of Cosine Similarity (DOCS), for quantitatively assessing the similarity between weight matrices in Large Language Models (LLMs), aiming to facilitate the analysis of their complex architectures. Leveraging DOCS, our analysis uncovers intriguing patterns in the latest open-source LLMs: adjacent layers frequently exhibit high weight similarity and tend to form clusters, suggesting depth-wise functional specialization. Additionally, we prove that DOCS is theoretically effective in quantifying similarity for orthogonal matrices, a crucial aspect given the prevalence of orthogonal initializations in LLMs. This research contributes to a deeper understanding of LLM architecture and behavior, offering tools with potential implications for developing more efficient and interpretable models.
format Preprint
id arxiv_https___arxiv_org_abs_2501_16650
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models
Min, Zeping
Wang, Xinshang
Computation and Language
Artificial Intelligence
We introduce a novel index, the Distribution of Cosine Similarity (DOCS), for quantitatively assessing the similarity between weight matrices in Large Language Models (LLMs), aiming to facilitate the analysis of their complex architectures. Leveraging DOCS, our analysis uncovers intriguing patterns in the latest open-source LLMs: adjacent layers frequently exhibit high weight similarity and tend to form clusters, suggesting depth-wise functional specialization. Additionally, we prove that DOCS is theoretically effective in quantifying similarity for orthogonal matrices, a crucial aspect given the prevalence of orthogonal initializations in LLMs. This research contributes to a deeper understanding of LLM architecture and behavior, offering tools with potential implications for developing more efficient and interpretable models.
title DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2501.16650