Guardado en:
Detalles Bibliográficos
Autores principales: Jiang, Houcheng, Fang, Junfeng, Wu, Jiaxin, Zhang, Tianyu, Gao, Chen, Li, Yong, Wang, Xiang, He, Xiangnan, Deng, Yang
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2510.07884
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866911199828901888
author Jiang, Houcheng
Fang, Junfeng
Wu, Jiaxin
Zhang, Tianyu
Gao, Chen
Li, Yong
Wang, Xiang
He, Xiangnan
Deng, Yang
author_facet Jiang, Houcheng
Fang, Junfeng
Wu, Jiaxin
Zhang, Tianyu
Gao, Chen
Li, Yong
Wang, Xiang
He, Xiangnan
Deng, Yang
contents Weak-to-strong generalization provides a promising paradigm for scaling large language models (LLMs) by training stronger models on samples from aligned weaker ones, without requiring human feedback or explicit reward modeling. However, its robustness and generalization are hindered by the noise and biases in weak-model outputs, which limit its applicability in practice. To address this challenge, we leverage implicit rewards, which approximate explicit rewards through log-likelihood ratios, and reveal their structural equivalence with Contrastive Decoding (CD), a decoding strategy shown to reduce noise in LLM generation. Building on this connection, we propose Contrastive Weak-to-Strong Generalization (ConG), a framework that employs contrastive decoding between pre- and post-alignment weak models to generate higher-quality samples. This approach enables more reliable capability transfer, denoising, and improved robustness, substantially mitigating the limitations of traditional weak-to-strong methods. Empirical results across different model families confirm consistent improvements, demonstrating the generality and effectiveness of ConG. Taken together, our findings highlight the potential of ConG to advance weak-to-strong generalization and provide a promising pathway toward AGI.
format Preprint
id arxiv_https___arxiv_org_abs_2510_07884
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Contrastive Weak-to-strong Generalization
Jiang, Houcheng
Fang, Junfeng
Wu, Jiaxin
Zhang, Tianyu
Gao, Chen
Li, Yong
Wang, Xiang
He, Xiangnan
Deng, Yang
Computation and Language
Artificial Intelligence
Weak-to-strong generalization provides a promising paradigm for scaling large language models (LLMs) by training stronger models on samples from aligned weaker ones, without requiring human feedback or explicit reward modeling. However, its robustness and generalization are hindered by the noise and biases in weak-model outputs, which limit its applicability in practice. To address this challenge, we leverage implicit rewards, which approximate explicit rewards through log-likelihood ratios, and reveal their structural equivalence with Contrastive Decoding (CD), a decoding strategy shown to reduce noise in LLM generation. Building on this connection, we propose Contrastive Weak-to-Strong Generalization (ConG), a framework that employs contrastive decoding between pre- and post-alignment weak models to generate higher-quality samples. This approach enables more reliable capability transfer, denoising, and improved robustness, substantially mitigating the limitations of traditional weak-to-strong methods. Empirical results across different model families confirm consistent improvements, demonstrating the generality and effectiveness of ConG. Taken together, our findings highlight the potential of ConG to advance weak-to-strong generalization and provide a promising pathway toward AGI.
title Contrastive Weak-to-strong Generalization
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2510.07884