Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Yihong, Jiang, Zhonglin, Xi, Ningyuan, Zhao, Yue, Gu, Qingqing, Chen, Xiyuan, Wu, Hao, Xu, Sheng, Zhou, Hange, Chen, Yong, Ji, Luo
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.12930
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914060313821184
author	Wang, Yihong Jiang, Zhonglin Xi, Ningyuan Zhao, Yue Gu, Qingqing Chen, Xiyuan Wu, Hao Xu, Sheng Zhou, Hange Chen, Yong Ji, Luo
author_facet	Wang, Yihong Jiang, Zhonglin Xi, Ningyuan Zhao, Yue Gu, Qingqing Chen, Xiyuan Wu, Hao Xu, Sheng Zhou, Hange Chen, Yong Ji, Luo
contents	Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. We also provide thorough theoretical analysis to validate the convergence and computational savings of our methodology. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_12930
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Making Language Model a Hierarchical Classifier Wang, Yihong Jiang, Zhonglin Xi, Ningyuan Zhao, Yue Gu, Qingqing Chen, Xiyuan Wu, Hao Xu, Sheng Zhou, Hange Chen, Yong Ji, Luo Computation and Language Artificial Intelligence Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. We also provide thorough theoretical analysis to validate the convergence and computational savings of our methodology. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.
title	Making Language Model a Hierarchical Classifier
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2507.12930

Similar Items