Saved in:
Bibliographic Details
Main Authors: Chu, Lei, Zhao, Yuhuan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.22017
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913152037289984
author Chu, Lei
Zhao, Yuhuan
author_facet Chu, Lei
Zhao, Yuhuan
contents Deepgenerative models havebecomeapromisingapproach for human motion prediction due to their ability to capture multimodal distributions and represent diverse human be haviors. However, generating predictions that are both di verse and jointly consistent among interacting agents re mains challenging. In addition, most existing approaches are primarily evaluated using single-agent (marginal) met rics, which fail to fully reflect the joint dynamics of multi agent interactions. We propose a diffusion-based frame work that improves multi-agent motion prediction by lever aging rich contextual information from historical trajecto ries. This information is incorporated through a guidance mechanism to enhance the diversity and expressiveness of predicted motions. To further enforce interaction consis tency, we introduce an energy-based formulation that re fines the joint trajectory distribution while preserving the plausibility of individual trajectories. Extensive experi ments on four benchmark datasets demonstrate that our approach consistently outperforms existing methods. No tably, our approach substantially improves both marginal (ADE/FDE) and joint (JADE/JFDE) metrics on ETH/UCY over strong marginal baselines. Compared with prior joint prediction methods, it delivers significant gains in marginal metrics while maintaining competitive joint performance.
format Preprint
id arxiv_https___arxiv_org_abs_2605_22017
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Diverse Yet Consistent: Context-Guided Diffusion with Energy-Based Joint Refinement for Multi-Agent Motion Prediction
Chu, Lei
Zhao, Yuhuan
Computer Vision and Pattern Recognition
Deepgenerative models havebecomeapromisingapproach for human motion prediction due to their ability to capture multimodal distributions and represent diverse human be haviors. However, generating predictions that are both di verse and jointly consistent among interacting agents re mains challenging. In addition, most existing approaches are primarily evaluated using single-agent (marginal) met rics, which fail to fully reflect the joint dynamics of multi agent interactions. We propose a diffusion-based frame work that improves multi-agent motion prediction by lever aging rich contextual information from historical trajecto ries. This information is incorporated through a guidance mechanism to enhance the diversity and expressiveness of predicted motions. To further enforce interaction consis tency, we introduce an energy-based formulation that re fines the joint trajectory distribution while preserving the plausibility of individual trajectories. Extensive experi ments on four benchmark datasets demonstrate that our approach consistently outperforms existing methods. No tably, our approach substantially improves both marginal (ADE/FDE) and joint (JADE/JFDE) metrics on ETH/UCY over strong marginal baselines. Compared with prior joint prediction methods, it delivers significant gains in marginal metrics while maintaining competitive joint performance.
title Diverse Yet Consistent: Context-Guided Diffusion with Energy-Based Joint Refinement for Multi-Agent Motion Prediction
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2605.22017