Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gogawale, Sharva, Grudka, Gal, Vasyutinsky-Shapira, Daria, Ventura, Omer, Kurar-Barakat, Berat, Dershowitz, Nachum
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.08138
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913022022254592
author	Gogawale, Sharva Grudka, Gal Vasyutinsky-Shapira, Daria Ventura, Omer Kurar-Barakat, Berat Dershowitz, Nachum
author_facet	Gogawale, Sharva Grudka, Gal Vasyutinsky-Shapira, Daria Ventura, Omer Kurar-Barakat, Berat Dershowitz, Nachum
contents	A join is a set of manuscript fragments identified as originally emanating from the same manuscript. We study manuscript join retrieval: Given a query image of a fragment, retrieve other fragments originating from the same physical manuscript. We propose Bag of Bags (BoB), an image-level representation that replaces the global-level visual codebook of classical Bag of Words (BoW) with a fragment-specific vocabulary of local visual words. Our pipeline trains a sparse convolutional autoencoder on binarized fragment patches, encodes connected components from each page, clusters the resulting embeddings with per-image k-means, and compares images using set-to-set distances between their local vocabularies. Evaluated on fragments from the Cairo Genizah, the best BoB variant (viz. Chamfer) achieves Hit@1 of 0.78 and MRR of 0.84, compared to 0.74 and 0.80, respectively, for the strongest BoW baseline (BoW-RawPatches-$χ^2$), a 6.1% relative improvement in top-1 accuracy. We furthermore study a mass-weighted BoB-OT variant that incorporates cluster population into prototype matching and present a formal approximation guarantee bounding its deviation from full component-level optimal transport. A two-stage pipeline using a BoW shortlist followed by BoB-OT reranking provides a practical compromise between retrieval strength and computational cost, supporting applicability to larger manuscript collections. The code and dataset are available at https://github.com/TAU-CH/midrash_bob.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_08138
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval Gogawale, Sharva Grudka, Gal Vasyutinsky-Shapira, Daria Ventura, Omer Kurar-Barakat, Berat Dershowitz, Nachum Computer Vision and Pattern Recognition A join is a set of manuscript fragments identified as originally emanating from the same manuscript. We study manuscript join retrieval: Given a query image of a fragment, retrieve other fragments originating from the same physical manuscript. We propose Bag of Bags (BoB), an image-level representation that replaces the global-level visual codebook of classical Bag of Words (BoW) with a fragment-specific vocabulary of local visual words. Our pipeline trains a sparse convolutional autoencoder on binarized fragment patches, encodes connected components from each page, clusters the resulting embeddings with per-image k-means, and compares images using set-to-set distances between their local vocabularies. Evaluated on fragments from the Cairo Genizah, the best BoB variant (viz. Chamfer) achieves Hit@1 of 0.78 and MRR of 0.84, compared to 0.74 and 0.80, respectively, for the strongest BoW baseline (BoW-RawPatches-$χ^2$), a 6.1% relative improvement in top-1 accuracy. We furthermore study a mass-weighted BoB-OT variant that incorporates cluster population into prototype matching and present a formal approximation guarantee bounding its deviation from full component-level optimal transport. A two-stage pipeline using a BoW shortlist followed by BoB-OT reranking provides a practical compromise between retrieval strength and computational cost, supporting applicability to larger manuscript collections. The code and dataset are available at https://github.com/TAU-CH/midrash_bob.
title	Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.08138

Similar Items