Enregistré dans:
Détails bibliographiques
Auteurs principaux: Mishra, Abhijit, Shukla, Shreya, Torres, Jose, Gwizdka, Jacek, Roychowdhury, Shounak
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2410.07507
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866917114678345728
author Mishra, Abhijit
Shukla, Shreya
Torres, Jose
Gwizdka, Jacek
Roychowdhury, Shounak
author_facet Mishra, Abhijit
Shukla, Shreya
Torres, Jose
Gwizdka, Jacek
Roychowdhury, Shounak
contents Decoding and expressing brain activity in a comprehensible form is a challenging frontier in AI. This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. The approach involves three stages: (1) training an EEG encoder for visual feature extraction, (2) fine-tuning LLMs on image and text data, enabling multimodal description generation, and (3) further fine-tuning on EEG embeddings to generate text directly from EEG during inference. Experiments on a public EEG dataset collected for six subjects with image stimuli and text captions demonstrate the efficacy of multimodal LLMs (LLaMA-v3, Mistral-v0.3, Qwen2.5), validated using traditional language generation evaluation metrics, as well as fluency and adequacy measures. This approach marks a significant advancement towards portable, low-cost "thoughts-to-text" technology with potential applications in both neuroscience and natural language processing.
format Preprint
id arxiv_https___arxiv_org_abs_2410_07507
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
Mishra, Abhijit
Shukla, Shreya
Torres, Jose
Gwizdka, Jacek
Roychowdhury, Shounak
Computation and Language
Decoding and expressing brain activity in a comprehensible form is a challenging frontier in AI. This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. The approach involves three stages: (1) training an EEG encoder for visual feature extraction, (2) fine-tuning LLMs on image and text data, enabling multimodal description generation, and (3) further fine-tuning on EEG embeddings to generate text directly from EEG during inference. Experiments on a public EEG dataset collected for six subjects with image stimuli and text captions demonstrate the efficacy of multimodal LLMs (LLaMA-v3, Mistral-v0.3, Qwen2.5), validated using traditional language generation evaluation metrics, as well as fluency and adequacy measures. This approach marks a significant advancement towards portable, low-cost "thoughts-to-text" technology with potential applications in both neuroscience and natural language processing.
title Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
topic Computation and Language
url https://arxiv.org/abs/2410.07507