Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Moucheng, Chatzaroulas, Evangelos, McCutcheon, Luc, Ahad, Abdul, Azeem, Hamzah, Marecki, Janusz, Anwar, Ammar
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.15867
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910657724547072
author	Xu, Moucheng Chatzaroulas, Evangelos McCutcheon, Luc Ahad, Abdul Azeem, Hamzah Marecki, Janusz Anwar, Ammar
author_facet	Xu, Moucheng Chatzaroulas, Evangelos McCutcheon, Luc Ahad, Abdul Azeem, Hamzah Marecki, Janusz Anwar, Ammar
contents	A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose an exploration-focused strategy called In-Context Ensemble Learning, to aggregate pseudo labels of multiple possible paths of SOPs. The proposed in-context ensemble learning as well enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-language models to generate more temporally accurate SOP, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-language models in SOP generation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_15867
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding Xu, Moucheng Chatzaroulas, Evangelos McCutcheon, Luc Ahad, Abdul Azeem, Hamzah Marecki, Janusz Anwar, Ammar Artificial Intelligence A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose an exploration-focused strategy called In-Context Ensemble Learning, to aggregate pseudo labels of multiple possible paths of SOPs. The proposed in-context ensemble learning as well enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-language models to generate more temporally accurate SOP, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-language models in SOP generation.
title	In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding
topic	Artificial Intelligence
url	https://arxiv.org/abs/2409.15867

Similar Items