Saved in:
Bibliographic Details
Main Authors: Yang, Lewen, Zhou, Xuanyu, Fan, Juao, Xie, Xinyi, Zhu, Shengxin
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2411.18021
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915036525494272
author Yang, Lewen
Zhou, Xuanyu
Fan, Juao
Xie, Xinyi
Zhu, Shengxin
author_facet Yang, Lewen
Zhou, Xuanyu
Fan, Juao
Xie, Xinyi
Zhu, Shengxin
contents Over the past few decades, Artificial Intelligence(AI) has progressed from the initial machine learning stage to the deep learning stage, and now to the stage of foundational models. Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning, and pre-trained models can be fine-tuned and applied to various downstream tasks. Under the framework of foundational models, models such as Bidirectional Encoder Representations from Transformers(BERT) and Generative Pre-trained Transformer(GPT) have greatly advanced the development of natural language processing(NLP), especially the emergence of many models based on BERT. BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model. It can capture bidirectional context information to predict the masked words in the sequence, this can improve the feature extraction ability of the model. This makes the model very useful for downstream tasks, especially for specialized applications. The model using the bidirectional encoder can better understand the domain knowledge and be better applied to these downstream tasks. So we hope to help understand how this technology has evolved and improved model performance in various natural language processing tasks under the background of foundational models and reveal its importance in capturing context information and improving the model's performance on downstream tasks. This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model. It also briefly analyzes BERT and the improvements of some models based on BERT. The model's performance on the Stanford Question Answering Dataset(SQuAD) and General Language Understanding Evaluation(GLUE) was compared.
format Preprint
id arxiv_https___arxiv_org_abs_2411_18021
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?
Yang, Lewen
Zhou, Xuanyu
Fan, Juao
Xie, Xinyi
Zhu, Shengxin
Computation and Language
Over the past few decades, Artificial Intelligence(AI) has progressed from the initial machine learning stage to the deep learning stage, and now to the stage of foundational models. Foundational models have the characteristics of pre-training, transfer learning, and self-supervised learning, and pre-trained models can be fine-tuned and applied to various downstream tasks. Under the framework of foundational models, models such as Bidirectional Encoder Representations from Transformers(BERT) and Generative Pre-trained Transformer(GPT) have greatly advanced the development of natural language processing(NLP), especially the emergence of many models based on BERT. BERT broke through the limitation of only using one-way methods for language modeling in pre-training by using a masked language model. It can capture bidirectional context information to predict the masked words in the sequence, this can improve the feature extraction ability of the model. This makes the model very useful for downstream tasks, especially for specialized applications. The model using the bidirectional encoder can better understand the domain knowledge and be better applied to these downstream tasks. So we hope to help understand how this technology has evolved and improved model performance in various natural language processing tasks under the background of foundational models and reveal its importance in capturing context information and improving the model's performance on downstream tasks. This article analyzes one-way and bidirectional models based on GPT and BERT and compares their differences based on the purpose of the model. It also briefly analyzes BERT and the improvements of some models based on BERT. The model's performance on the Stanford Question Answering Dataset(SQuAD) and General Language Understanding Evaluation(GLUE) was compared.
title Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?
topic Computation and Language
url https://arxiv.org/abs/2411.18021