Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Xiangwen, Zhang, Yibo Jacky, Ding, Zhoujie, Tsai, Katherine, Wu, Haolun, Koyejo, Sanmi
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Multiagent Systems
Online Access:	https://arxiv.org/abs/2502.17721
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912945937580032
author	Wang, Xiangwen Zhang, Yibo Jacky Ding, Zhoujie Tsai, Katherine Wu, Haolun Koyejo, Sanmi
author_facet	Wang, Xiangwen Zhang, Yibo Jacky Ding, Zhoujie Tsai, Katherine Wu, Haolun Koyejo, Sanmi
contents	Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remarkable improvements compared to single models in various tasks. To ensure their effective deployment in real-world applications, aligning these systems with human preferences is crucial. However, aligning the compound system via policy optimization, unlike the alignment of a single model, is challenging for two main reasons: (i) non-differentiable interactions between components make end-to-end gradient-based optimization method inapplicable, and (ii) system-level preferences cannot be directly transformed into component-level preferences. To address these challenges, we first formulate compound AI systems as Directed Acyclic Graphs (DAGs), explicitly modeling both component interactions and the associated data flows. Building on this formulation, we introduce $\textbf{SysDPO}$, a framework that extends Direct Preference Optimization (DPO) to enable joint system-level alignment. We propose two variants, SysDPO-Direct and SysDPO-Sampling, tailored for scenarios depending on whether we construct a system-specific preference dataset. We empirically demonstrate the effectiveness of our approach across two applications: the joint alignment of a language model and a diffusion model, and the joint alignment of an LLM collaboration system.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_17721
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Aligning Compound AI Systems via System-level DPO Wang, Xiangwen Zhang, Yibo Jacky Ding, Zhoujie Tsai, Katherine Wu, Haolun Koyejo, Sanmi Machine Learning Artificial Intelligence Multiagent Systems Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remarkable improvements compared to single models in various tasks. To ensure their effective deployment in real-world applications, aligning these systems with human preferences is crucial. However, aligning the compound system via policy optimization, unlike the alignment of a single model, is challenging for two main reasons: (i) non-differentiable interactions between components make end-to-end gradient-based optimization method inapplicable, and (ii) system-level preferences cannot be directly transformed into component-level preferences. To address these challenges, we first formulate compound AI systems as Directed Acyclic Graphs (DAGs), explicitly modeling both component interactions and the associated data flows. Building on this formulation, we introduce $\textbf{SysDPO}$, a framework that extends Direct Preference Optimization (DPO) to enable joint system-level alignment. We propose two variants, SysDPO-Direct and SysDPO-Sampling, tailored for scenarios depending on whether we construct a system-specific preference dataset. We empirically demonstrate the effectiveness of our approach across two applications: the joint alignment of a language model and a diffusion model, and the joint alignment of an LLM collaboration system.
title	Aligning Compound AI Systems via System-level DPO
topic	Machine Learning Artificial Intelligence Multiagent Systems
url	https://arxiv.org/abs/2502.17721

Similar Items