Guardado en:
Detalles Bibliográficos
Autores principales: Zhang, Kun, Li, Jingyu, Li, Zhe, Zhang, Jingjing, Li, Fan, Liu, Yandong, Yan, Rui, Jiang, Zihang, Chen, Nan, Zhang, Lei, Zhang, Yongdong, Mao, Zhendong, Zhou, S. Kevin
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2503.01334
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866912492007981056
author Zhang, Kun
Li, Jingyu
Li, Zhe
Zhang, Jingjing
Li, Fan
Liu, Yandong
Yan, Rui
Jiang, Zihang
Chen, Nan
Zhang, Lei
Zhang, Yongdong
Mao, Zhendong
Zhou, S. Kevin
author_facet Zhang, Kun
Li, Jingyu
Li, Zhe
Zhang, Jingjing
Li, Fan
Liu, Yandong
Yan, Rui
Jiang, Zihang
Chen, Nan
Zhang, Lei
Zhang, Yongdong
Mao, Zhendong
Zhou, S. Kevin
contents The burgeoning volume of multi-modal data necessitates advanced retrieval paradigms beyond unimodal and cross-modal approaches. Composed Multi-modal Retrieval (CMR) emerges as a pivotal next-generation technology, enabling users to query images or videos by integrating a reference visual input with textual modifications, thereby achieving unprecedented flexibility and precision. This paper provides a comprehensive survey of CMR, covering its fundamental challenges, technical advancements, and applications. CMR is categorized into supervised, zero-shot, and semi-supervised learning paradigms. We discuss key research directions, including data construction, model architecture, and loss optimization in supervised CMR, as well as transformation frameworks and linear integration in zero-shot CMR, and semi-supervised CMR that leverages generated pseudo-triplets while addressing data noise/uncertainty. Additionally, we extensively survey the diverse application landscape of CMR, highlighting its transformative potential in e-commerce, social media, search engines, public security, etc. Seven high impact application scenarios are explored in detail with benchmark data sets and performance analysis. Finally, we further provide new potential research directions with the hope of inspiring exploration in other yet-to-be-explored fields. A curated list of works is available at: https://github.com/kkzhang95/Awesome-Composed-Multi-modal-Retrieval
format Preprint
id arxiv_https___arxiv_org_abs_2503_01334
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Zhang, Kun
Li, Jingyu
Li, Zhe
Zhang, Jingjing
Li, Fan
Liu, Yandong
Yan, Rui
Jiang, Zihang
Chen, Nan
Zhang, Lei
Zhang, Yongdong
Mao, Zhendong
Zhou, S. Kevin
Information Retrieval
Computer Vision and Pattern Recognition
The burgeoning volume of multi-modal data necessitates advanced retrieval paradigms beyond unimodal and cross-modal approaches. Composed Multi-modal Retrieval (CMR) emerges as a pivotal next-generation technology, enabling users to query images or videos by integrating a reference visual input with textual modifications, thereby achieving unprecedented flexibility and precision. This paper provides a comprehensive survey of CMR, covering its fundamental challenges, technical advancements, and applications. CMR is categorized into supervised, zero-shot, and semi-supervised learning paradigms. We discuss key research directions, including data construction, model architecture, and loss optimization in supervised CMR, as well as transformation frameworks and linear integration in zero-shot CMR, and semi-supervised CMR that leverages generated pseudo-triplets while addressing data noise/uncertainty. Additionally, we extensively survey the diverse application landscape of CMR, highlighting its transformative potential in e-commerce, social media, search engines, public security, etc. Seven high impact application scenarios are explored in detail with benchmark data sets and performance analysis. Finally, we further provide new potential research directions with the hope of inspiring exploration in other yet-to-be-explored fields. A curated list of works is available at: https://github.com/kkzhang95/Awesome-Composed-Multi-modal-Retrieval
title Composed Multi-modal Retrieval: A Survey of Approaches and Applications
topic Information Retrieval
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2503.01334