Guardado en:
| Autores principales: | , , , , , , , , , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2503.01334 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866912492007981056 |
|---|---|
| author | Zhang, Kun Li, Jingyu Li, Zhe Zhang, Jingjing Li, Fan Liu, Yandong Yan, Rui Jiang, Zihang Chen, Nan Zhang, Lei Zhang, Yongdong Mao, Zhendong Zhou, S. Kevin |
| author_facet | Zhang, Kun Li, Jingyu Li, Zhe Zhang, Jingjing Li, Fan Liu, Yandong Yan, Rui Jiang, Zihang Chen, Nan Zhang, Lei Zhang, Yongdong Mao, Zhendong Zhou, S. Kevin |
| contents | The burgeoning volume of multi-modal data necessitates advanced retrieval paradigms beyond unimodal and cross-modal approaches. Composed Multi-modal Retrieval (CMR) emerges as a pivotal next-generation technology, enabling users to query images or videos by integrating a reference visual input with textual modifications, thereby achieving unprecedented flexibility and precision. This paper provides a comprehensive survey of CMR, covering its fundamental challenges, technical advancements, and applications. CMR is categorized into supervised, zero-shot, and semi-supervised learning paradigms. We discuss key research directions, including data construction, model architecture, and loss optimization in supervised CMR, as well as transformation frameworks and linear integration in zero-shot CMR, and semi-supervised CMR that leverages generated pseudo-triplets while addressing data noise/uncertainty. Additionally, we extensively survey the diverse application landscape of CMR, highlighting its transformative potential in e-commerce, social media, search engines, public security, etc. Seven high impact application scenarios are explored in detail with benchmark data sets and performance analysis. Finally, we further provide new potential research directions with the hope of inspiring exploration in other yet-to-be-explored fields. A curated list of works is available at: https://github.com/kkzhang95/Awesome-Composed-Multi-modal-Retrieval |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_01334 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Composed Multi-modal Retrieval: A Survey of Approaches and Applications Zhang, Kun Li, Jingyu Li, Zhe Zhang, Jingjing Li, Fan Liu, Yandong Yan, Rui Jiang, Zihang Chen, Nan Zhang, Lei Zhang, Yongdong Mao, Zhendong Zhou, S. Kevin Information Retrieval Computer Vision and Pattern Recognition The burgeoning volume of multi-modal data necessitates advanced retrieval paradigms beyond unimodal and cross-modal approaches. Composed Multi-modal Retrieval (CMR) emerges as a pivotal next-generation technology, enabling users to query images or videos by integrating a reference visual input with textual modifications, thereby achieving unprecedented flexibility and precision. This paper provides a comprehensive survey of CMR, covering its fundamental challenges, technical advancements, and applications. CMR is categorized into supervised, zero-shot, and semi-supervised learning paradigms. We discuss key research directions, including data construction, model architecture, and loss optimization in supervised CMR, as well as transformation frameworks and linear integration in zero-shot CMR, and semi-supervised CMR that leverages generated pseudo-triplets while addressing data noise/uncertainty. Additionally, we extensively survey the diverse application landscape of CMR, highlighting its transformative potential in e-commerce, social media, search engines, public security, etc. Seven high impact application scenarios are explored in detail with benchmark data sets and performance analysis. Finally, we further provide new potential research directions with the hope of inspiring exploration in other yet-to-be-explored fields. A curated list of works is available at: https://github.com/kkzhang95/Awesome-Composed-Multi-modal-Retrieval |
| title | Composed Multi-modal Retrieval: A Survey of Approaches and Applications |
| topic | Information Retrieval Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2503.01334 |