Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Molinari, Marco, Shao, Victor, Imeneo, Luca, Mikolajczak, Mateusz, Tregubiak, Vladimir, Pandey, Abhimanyu, Pereira, Sebastian Kuznetsov Ryder Torres
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computation and Language Machine Learning General Economics Economics
Online-Zugang:	https://arxiv.org/abs/2412.02605
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866910965740601344
author	Molinari, Marco Shao, Victor Imeneo, Luca Mikolajczak, Mateusz Tregubiak, Vladimir Pandey, Abhimanyu Pereira, Sebastian Kuznetsov Ryder Torres
author_facet	Molinari, Marco Shao, Victor Imeneo, Luca Mikolajczak, Mateusz Tregubiak, Vladimir Pandey, Abhimanyu Pereira, Sebastian Kuznetsov Ryder Torres
contents	Determining company similarity is a vital task in finance, underpinning risk management, hedging, and portfolio diversification. Practitioners often rely on sector and industry classifications such as SIC and GICS codes to gauge similarity, the former being used by the U.S. Securities and Exchange Commission (SEC), and the latter widely used by the investment community. Since these classifications lack granularity and need regular updating, using clusters of embeddings of company descriptions has been proposed as a potential alternative, but the lack of interpretability in token embeddings poses a significant barrier to adoption in high-stakes contexts. Sparse Autoencoders (SAEs) have shown promise in enhancing the interpretability of Large Language Models (LLMs) by decomposing Large Language Model (LLM) activations into interpretable features. Moreover, SAEs capture an LLM's internal representation of a company description, as opposed to semantic similarity alone, as is the case with embeddings. We apply SAEs to company descriptions, and obtain meaningful clusters of equities. We benchmark SAE features against SIC-codes, Industry codes, and Embeddings. Our results demonstrate that SAE features surpass sector classifications and embeddings in capturing fundamental company characteristics. This is evidenced by their superior performance in correlating logged monthly returns - a proxy for similarity - and generating higher Sharpe ratios in co-integration trading strategies, which underscores deeper fundamental similarities among companies. Finally, we verify the interpretability of our clusters, and demonstrate that sparse features form simple and interpretable explanations for our clusters.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_02605
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Interpretable Company Similarity with Sparse Autoencoders Molinari, Marco Shao, Victor Imeneo, Luca Mikolajczak, Mateusz Tregubiak, Vladimir Pandey, Abhimanyu Pereira, Sebastian Kuznetsov Ryder Torres Computation and Language Machine Learning General Economics Economics Determining company similarity is a vital task in finance, underpinning risk management, hedging, and portfolio diversification. Practitioners often rely on sector and industry classifications such as SIC and GICS codes to gauge similarity, the former being used by the U.S. Securities and Exchange Commission (SEC), and the latter widely used by the investment community. Since these classifications lack granularity and need regular updating, using clusters of embeddings of company descriptions has been proposed as a potential alternative, but the lack of interpretability in token embeddings poses a significant barrier to adoption in high-stakes contexts. Sparse Autoencoders (SAEs) have shown promise in enhancing the interpretability of Large Language Models (LLMs) by decomposing Large Language Model (LLM) activations into interpretable features. Moreover, SAEs capture an LLM's internal representation of a company description, as opposed to semantic similarity alone, as is the case with embeddings. We apply SAEs to company descriptions, and obtain meaningful clusters of equities. We benchmark SAE features against SIC-codes, Industry codes, and Embeddings. Our results demonstrate that SAE features surpass sector classifications and embeddings in capturing fundamental company characteristics. This is evidenced by their superior performance in correlating logged monthly returns - a proxy for similarity - and generating higher Sharpe ratios in co-integration trading strategies, which underscores deeper fundamental similarities among companies. Finally, we verify the interpretability of our clusters, and demonstrate that sparse features form simple and interpretable explanations for our clusters.
title	Interpretable Company Similarity with Sparse Autoencoders
topic	Computation and Language Machine Learning General Economics Economics
url	https://arxiv.org/abs/2412.02605

Ähnliche Einträge