Saved in:
Bibliographic Details
Main Authors: Wang, Xiangwen, Zhang, Yibo Jacky, Ding, Zhoujie, Tsai, Katherine, Wu, Haolun, Koyejo, Sanmi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.17721
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912945937580032
author Wang, Xiangwen
Zhang, Yibo Jacky
Ding, Zhoujie
Tsai, Katherine
Wu, Haolun
Koyejo, Sanmi
author_facet Wang, Xiangwen
Zhang, Yibo Jacky
Ding, Zhoujie
Tsai, Katherine
Wu, Haolun
Koyejo, Sanmi
contents Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remarkable improvements compared to single models in various tasks. To ensure their effective deployment in real-world applications, aligning these systems with human preferences is crucial. However, aligning the compound system via policy optimization, unlike the alignment of a single model, is challenging for two main reasons: (i) non-differentiable interactions between components make end-to-end gradient-based optimization method inapplicable, and (ii) system-level preferences cannot be directly transformed into component-level preferences. To address these challenges, we first formulate compound AI systems as Directed Acyclic Graphs (DAGs), explicitly modeling both component interactions and the associated data flows. Building on this formulation, we introduce $\textbf{SysDPO}$, a framework that extends Direct Preference Optimization (DPO) to enable joint system-level alignment. We propose two variants, SysDPO-Direct and SysDPO-Sampling, tailored for scenarios depending on whether we construct a system-specific preference dataset. We empirically demonstrate the effectiveness of our approach across two applications: the joint alignment of a language model and a diffusion model, and the joint alignment of an LLM collaboration system.
format Preprint
id arxiv_https___arxiv_org_abs_2502_17721
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Aligning Compound AI Systems via System-level DPO
Wang, Xiangwen
Zhang, Yibo Jacky
Ding, Zhoujie
Tsai, Katherine
Wu, Haolun
Koyejo, Sanmi
Machine Learning
Artificial Intelligence
Multiagent Systems
Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remarkable improvements compared to single models in various tasks. To ensure their effective deployment in real-world applications, aligning these systems with human preferences is crucial. However, aligning the compound system via policy optimization, unlike the alignment of a single model, is challenging for two main reasons: (i) non-differentiable interactions between components make end-to-end gradient-based optimization method inapplicable, and (ii) system-level preferences cannot be directly transformed into component-level preferences. To address these challenges, we first formulate compound AI systems as Directed Acyclic Graphs (DAGs), explicitly modeling both component interactions and the associated data flows. Building on this formulation, we introduce $\textbf{SysDPO}$, a framework that extends Direct Preference Optimization (DPO) to enable joint system-level alignment. We propose two variants, SysDPO-Direct and SysDPO-Sampling, tailored for scenarios depending on whether we construct a system-specific preference dataset. We empirically demonstrate the effectiveness of our approach across two applications: the joint alignment of a language model and a diffusion model, and the joint alignment of an LLM collaboration system.
title Aligning Compound AI Systems via System-level DPO
topic Machine Learning
Artificial Intelligence
Multiagent Systems
url https://arxiv.org/abs/2502.17721