Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Mueller, David, Dredze, Mark, Andrews, Nicholas
Formato:	Preprint
Publicado:	2024
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2408.14677
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866913670446972928
author	Mueller, David Dredze, Mark Andrews, Nicholas
author_facet	Mueller, David Dredze, Mark Andrews, Nicholas
contents	Despite the widespread adoption of multi-task training in deep learning, little is understood about how multi-task learning (MTL) affects generalization. Prior work has conjectured that the negative effects of MTL are due to optimization challenges that arise during training, and many optimization methods have been proposed to improve multi-task performance. However, recent work has shown that these methods fail to consistently improve multi-task generalization. In this work, we seek to improve our understanding of these failures by empirically studying how MTL impacts the optimization of tasks, and whether this impact can explain the effects of MTL on generalization. We show that MTL results in a generalization gap (a gap in generalization at comparable training loss) between single-task and multi-task trajectories early into training. However, we find that factors of the optimization trajectory previously proposed to explain generalization gaps in single-task settings cannot explain the generalization gaps between single-task and multi-task models. Moreover, we show that the amount of gradient conflict between tasks is correlated with negative effects to task optimization, but is not predictive of generalization. Our work sheds light on the underlying causes for failures in MTL and, importantly, raises questions about the role of general purpose multi-task optimization algorithms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_14677
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Can Optimization Trajectories Explain Multi-Task Transfer? Mueller, David Dredze, Mark Andrews, Nicholas Machine Learning Despite the widespread adoption of multi-task training in deep learning, little is understood about how multi-task learning (MTL) affects generalization. Prior work has conjectured that the negative effects of MTL are due to optimization challenges that arise during training, and many optimization methods have been proposed to improve multi-task performance. However, recent work has shown that these methods fail to consistently improve multi-task generalization. In this work, we seek to improve our understanding of these failures by empirically studying how MTL impacts the optimization of tasks, and whether this impact can explain the effects of MTL on generalization. We show that MTL results in a generalization gap (a gap in generalization at comparable training loss) between single-task and multi-task trajectories early into training. However, we find that factors of the optimization trajectory previously proposed to explain generalization gaps in single-task settings cannot explain the generalization gaps between single-task and multi-task models. Moreover, we show that the amount of gradient conflict between tasks is correlated with negative effects to task optimization, but is not predictive of generalization. Our work sheds light on the underlying causes for failures in MTL and, importantly, raises questions about the role of general purpose multi-task optimization algorithms.
title	Can Optimization Trajectories Explain Multi-Task Transfer?
topic	Machine Learning
url	https://arxiv.org/abs/2408.14677

Ejemplares similares