Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hu, Yixuan, Xue, Yuxuan, Klenk, Simon, Cremers, Daniel, Pons-Moll, Gerard
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.22864
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908678433538048
author	Hu, Yixuan Xue, Yuxuan Klenk, Simon Cremers, Daniel Pons-Moll, Gerard
author_facet	Hu, Yixuan Xue, Yuxuan Klenk, Simon Cremers, Daniel Pons-Moll, Gerard
contents	In recent years, event cameras have gained significant attention due to their bio-inspired properties, such as high temporal resolution and high dynamic range. However, obtaining large-scale labeled ground-truth data for event-based vision tasks remains challenging and costly. In this paper, we present ControlEvents, a diffusion-based generative model designed to synthesize high-quality event data guided by diverse control signals such as class text labels, 2D skeletons, and 3D body poses. Our key insight is to leverage the diffusion prior from foundation models, such as Stable Diffusion, enabling high-quality event data generation with minimal fine-tuning and limited labeled data. Our method streamlines the data generation process and significantly reduces the cost of producing labeled event datasets. We demonstrate the effectiveness of our approach by synthesizing event data for visual recognition, 2D skeleton estimation, and 3D body pose estimation. Our experiments show that the synthesized labeled event data enhances model performance in all tasks. Additionally, our approach can generate events based on unseen text labels during training, illustrating the powerful text-based generation capabilities inherited from foundation models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_22864
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models Hu, Yixuan Xue, Yuxuan Klenk, Simon Cremers, Daniel Pons-Moll, Gerard Computer Vision and Pattern Recognition In recent years, event cameras have gained significant attention due to their bio-inspired properties, such as high temporal resolution and high dynamic range. However, obtaining large-scale labeled ground-truth data for event-based vision tasks remains challenging and costly. In this paper, we present ControlEvents, a diffusion-based generative model designed to synthesize high-quality event data guided by diverse control signals such as class text labels, 2D skeletons, and 3D body poses. Our key insight is to leverage the diffusion prior from foundation models, such as Stable Diffusion, enabling high-quality event data generation with minimal fine-tuning and limited labeled data. Our method streamlines the data generation process and significantly reduces the cost of producing labeled event datasets. We demonstrate the effectiveness of our approach by synthesizing event data for visual recognition, 2D skeleton estimation, and 3D body pose estimation. Our experiments show that the synthesized labeled event data enhances model performance in all tasks. Additionally, our approach can generate events based on unseen text labels during training, illustrating the powerful text-based generation capabilities inherited from foundation models.
title	ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2509.22864

Similar Items