Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zheng, Chen, Sun, Ke, Tang, Da, Ma, Yukun, Zhang, Yuyu, Xi, Chenguang, Zhou, Xun
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.02072
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916080734175232
author	Zheng, Chen Sun, Ke Tang, Da Ma, Yukun Zhang, Yuyu Xi, Chenguang Zhou, Xun
author_facet	Zheng, Chen Sun, Ke Tang, Da Ma, Yukun Zhang, Yuyu Xi, Chenguang Zhou, Xun
contents	The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_02072
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers Zheng, Chen Sun, Ke Tang, Da Ma, Yukun Zhang, Yuyu Xi, Chenguang Zhou, Xun Computation and Language The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.
title	ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers
topic	Computation and Language
url	https://arxiv.org/abs/2401.02072

Similar Items