Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Han, Ning, Zeng, Yawen, Long, Shaohua, Li, Chengqing, Yang, Sijie, Tan, Dun, Dong, Jianfeng, Chen, Jingjing
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.01312
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915646871175168
author	Han, Ning Zeng, Yawen Long, Shaohua Li, Chengqing Yang, Sijie Tan, Dun Dong, Jianfeng Chen, Jingjing
author_facet	Han, Ning Zeng, Yawen Long, Shaohua Li, Chengqing Yang, Sijie Tan, Dun Dong, Jianfeng Chen, Jingjing
contents	In recent years, significant developments have been made in both video retrieval and video moment retrieval tasks, which respectively retrieve complete videos or moments for a given text query. These advancements have greatly improved user satisfaction during the search process. However, previous work has failed to establish meaningful "interaction" between the retrieval system and the user, and its one-way retrieval paradigm can no longer fully meet the personalization and dynamic needs of at least 80.8\% of users. In this paper, we introduce the Interactive Video Corpus Retrieval (IVCR) task, a more realistic setting that enables multi-turn, conversational, and realistic interactions between the user and the retrieval system. To facilitate research on this challenging task, we introduce IVCR-200K, a high-quality, bilingual, multi-turn, conversational, and abstract semantic dataset that supports video retrieval and even moment retrieval. Furthermore, we propose a comprehensive framework based on multi-modal large language models (MLLMs) to help users interact in several modes with more explainable solutions. The extensive experiments demonstrate the effectiveness of our dataset and framework.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_01312
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus Retrieval Han, Ning Zeng, Yawen Long, Shaohua Li, Chengqing Yang, Sijie Tan, Dun Dong, Jianfeng Chen, Jingjing Computer Vision and Pattern Recognition In recent years, significant developments have been made in both video retrieval and video moment retrieval tasks, which respectively retrieve complete videos or moments for a given text query. These advancements have greatly improved user satisfaction during the search process. However, previous work has failed to establish meaningful "interaction" between the retrieval system and the user, and its one-way retrieval paradigm can no longer fully meet the personalization and dynamic needs of at least 80.8\% of users. In this paper, we introduce the Interactive Video Corpus Retrieval (IVCR) task, a more realistic setting that enables multi-turn, conversational, and realistic interactions between the user and the retrieval system. To facilitate research on this challenging task, we introduce IVCR-200K, a high-quality, bilingual, multi-turn, conversational, and abstract semantic dataset that supports video retrieval and even moment retrieval. Furthermore, we propose a comprehensive framework based on multi-modal large language models (MLLMs) to help users interact in several modes with more explainable solutions. The extensive experiments demonstrate the effectiveness of our dataset and framework.
title	IVCR-200K: A Large-Scale Multi-turn Dialogue Benchmark for Interactive Video Corpus Retrieval
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.01312

Similar Items