Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Desai, Jay, Guo, Xiaobo, Sengamedu, Srinivasan H.
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2406.02592
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914824457289728
author	Desai, Jay Guo, Xiaobo Sengamedu, Srinivasan H.
author_facet	Desai, Jay Guo, Xiaobo Sengamedu, Srinivasan H.
contents	The performance of Large Language Models has achieved superhuman breadth with unprecedented depth. At the same time, the language models are mostly black box models and the underlying mechanisms for performance have been evaluated using synthetic or mechanistic schemes. We extend current mechanistic schemes to incorporate Logic, memory, and nuances of Language such as latent structure. The proposed framework is called LOLAMEME and we provide two instantiations of LOLAMEME: LoLa and MeMe languages. We then consider two generative language model architectures: transformer-based GPT-2 and convolution-based Hyena. We propose the hybrid architecture T HEX and use LOLAMEME framework is used to compare three architectures. T HEX outperforms GPT-2 and Hyena on select tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_02592
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	LOLAMEME: Logic, Language, Memory, Mechanistic Framework Desai, Jay Guo, Xiaobo Sengamedu, Srinivasan H. Machine Learning Artificial Intelligence Computation and Language The performance of Large Language Models has achieved superhuman breadth with unprecedented depth. At the same time, the language models are mostly black box models and the underlying mechanisms for performance have been evaluated using synthetic or mechanistic schemes. We extend current mechanistic schemes to incorporate Logic, memory, and nuances of Language such as latent structure. The proposed framework is called LOLAMEME and we provide two instantiations of LOLAMEME: LoLa and MeMe languages. We then consider two generative language model architectures: transformer-based GPT-2 and convolution-based Hyena. We propose the hybrid architecture T HEX and use LOLAMEME framework is used to compare three architectures. T HEX outperforms GPT-2 and Hyena on select tasks.
title	LOLAMEME: Logic, Language, Memory, Mechanistic Framework
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2406.02592

Similar Items