Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qu, Zhi, Wang, Yiran, Ding, Chenchen, Tanaka, Hideki, Utiyama, Masao, Watanabe, Taro
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.02101
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929611651153920
author	Qu, Zhi Wang, Yiran Ding, Chenchen Tanaka, Hideki Utiyama, Masao Watanabe, Taro
author_facet	Qu, Zhi Wang, Yiran Ding, Chenchen Tanaka, Hideki Utiyama, Masao Watanabe, Taro
contents	Existing multilingual neural machine translation (MNMT) approaches mainly focus on improving models with the encoder-decoder architecture to translate multiple languages. However, decoder-only architecture has been explored less in MNMT due to its underperformance when trained on parallel data solely. In this work, we attribute the issue of the decoder-only architecture to its lack of language transfer capability. Specifically, the decoder-only architecture is insufficient in encoding source tokens with the target language features. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and OPUS-100 datasets, considering both training from scratch and fine-tuning scenarios. Experimental results show that, compared to the encoder-decoder architecture, our methods not only perform competitively in supervised translations but also achieve improvements of up to 3.39 BLEU, 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET in zero-shot translations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_02101
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation Qu, Zhi Wang, Yiran Ding, Chenchen Tanaka, Hideki Utiyama, Masao Watanabe, Taro Computation and Language Existing multilingual neural machine translation (MNMT) approaches mainly focus on improving models with the encoder-decoder architecture to translate multiple languages. However, decoder-only architecture has been explored less in MNMT due to its underperformance when trained on parallel data solely. In this work, we attribute the issue of the decoder-only architecture to its lack of language transfer capability. Specifically, the decoder-only architecture is insufficient in encoding source tokens with the target language features. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and OPUS-100 datasets, considering both training from scratch and fine-tuning scenarios. Experimental results show that, compared to the encoder-decoder architecture, our methods not only perform competitively in supervised translations but also achieve improvements of up to 3.39 BLEU, 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET in zero-shot translations.
title	Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation
topic	Computation and Language
url	https://arxiv.org/abs/2412.02101

Similar Items