Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Rabanser, Stephan, Rauschmayr, Nathalie, Kulshrestha, Achin, Poklukar, Petra, Jitkrittum, Wittawat, Augenstein, Sean, Wang, Congchao, Tombari, Federico
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2502.19335
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917034588110848
author	Rabanser, Stephan Rauschmayr, Nathalie Kulshrestha, Achin Poklukar, Petra Jitkrittum, Wittawat Augenstein, Sean Wang, Congchao Tombari, Federico
author_facet	Rabanser, Stephan Rauschmayr, Nathalie Kulshrestha, Achin Poklukar, Petra Jitkrittum, Wittawat Augenstein, Sean Wang, Congchao Tombari, Federico
contents	Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluate our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_19335
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Gatekeeper: Improving Model Cascades Through Confidence Tuning Rabanser, Stephan Rauschmayr, Nathalie Kulshrestha, Achin Poklukar, Petra Jitkrittum, Wittawat Augenstein, Sean Wang, Congchao Tombari, Federico Machine Learning Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluate our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance.
title	Gatekeeper: Improving Model Cascades Through Confidence Tuning
topic	Machine Learning
url	https://arxiv.org/abs/2502.19335

Ejemplares similares