Saved in:
Bibliographic Details
Main Authors: Wang, Yihong, Jiang, Zhonglin, Xi, Ningyuan, Zhao, Yue, Gu, Qingqing, Chen, Xiyuan, Wu, Hao, Xu, Sheng, Zhou, Hange, Chen, Yong, Ji, Luo
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.12930
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914060313821184
author Wang, Yihong
Jiang, Zhonglin
Xi, Ningyuan
Zhao, Yue
Gu, Qingqing
Chen, Xiyuan
Wu, Hao
Xu, Sheng
Zhou, Hange
Chen, Yong
Ji, Luo
author_facet Wang, Yihong
Jiang, Zhonglin
Xi, Ningyuan
Zhao, Yue
Gu, Qingqing
Chen, Xiyuan
Wu, Hao
Xu, Sheng
Zhou, Hange
Chen, Yong
Ji, Luo
contents Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. We also provide thorough theoretical analysis to validate the convergence and computational savings of our methodology. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.
format Preprint
id arxiv_https___arxiv_org_abs_2507_12930
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Making Language Model a Hierarchical Classifier
Wang, Yihong
Jiang, Zhonglin
Xi, Ningyuan
Zhao, Yue
Gu, Qingqing
Chen, Xiyuan
Wu, Hao
Xu, Sheng
Zhou, Hange
Chen, Yong
Ji, Luo
Computation and Language
Artificial Intelligence
Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESconv, EmpatheticDialogues, and several cognitive tests. We also provide thorough theoretical analysis to validate the convergence and computational savings of our methodology. This study suggests the possibility of a generalized hierarchical reasoner, pretraining from scratch.
title Making Language Model a Hierarchical Classifier
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2507.12930