Saved in:
Bibliographic Details
Main Authors: Gogawale, Sharva, Grudka, Gal, Vasyutinsky-Shapira, Daria, Ventura, Omer, Kurar-Barakat, Berat, Dershowitz, Nachum
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.08138
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913022022254592
author Gogawale, Sharva
Grudka, Gal
Vasyutinsky-Shapira, Daria
Ventura, Omer
Kurar-Barakat, Berat
Dershowitz, Nachum
author_facet Gogawale, Sharva
Grudka, Gal
Vasyutinsky-Shapira, Daria
Ventura, Omer
Kurar-Barakat, Berat
Dershowitz, Nachum
contents A join is a set of manuscript fragments identified as originally emanating from the same manuscript. We study manuscript join retrieval: Given a query image of a fragment, retrieve other fragments originating from the same physical manuscript. We propose Bag of Bags (BoB), an image-level representation that replaces the global-level visual codebook of classical Bag of Words (BoW) with a fragment-specific vocabulary of local visual words. Our pipeline trains a sparse convolutional autoencoder on binarized fragment patches, encodes connected components from each page, clusters the resulting embeddings with per-image k-means, and compares images using set-to-set distances between their local vocabularies. Evaluated on fragments from the Cairo Genizah, the best BoB variant (viz. Chamfer) achieves Hit@1 of 0.78 and MRR of 0.84, compared to 0.74 and 0.80, respectively, for the strongest BoW baseline (BoW-RawPatches-$χ^2$), a 6.1% relative improvement in top-1 accuracy. We furthermore study a mass-weighted BoB-OT variant that incorporates cluster population into prototype matching and present a formal approximation guarantee bounding its deviation from full component-level optimal transport. A two-stage pipeline using a BoW shortlist followed by BoB-OT reranking provides a practical compromise between retrieval strength and computational cost, supporting applicability to larger manuscript collections. The code and dataset are available at https://github.com/TAU-CH/midrash_bob.
format Preprint
id arxiv_https___arxiv_org_abs_2604_08138
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
Gogawale, Sharva
Grudka, Gal
Vasyutinsky-Shapira, Daria
Ventura, Omer
Kurar-Barakat, Berat
Dershowitz, Nachum
Computer Vision and Pattern Recognition
A join is a set of manuscript fragments identified as originally emanating from the same manuscript. We study manuscript join retrieval: Given a query image of a fragment, retrieve other fragments originating from the same physical manuscript. We propose Bag of Bags (BoB), an image-level representation that replaces the global-level visual codebook of classical Bag of Words (BoW) with a fragment-specific vocabulary of local visual words. Our pipeline trains a sparse convolutional autoencoder on binarized fragment patches, encodes connected components from each page, clusters the resulting embeddings with per-image k-means, and compares images using set-to-set distances between their local vocabularies. Evaluated on fragments from the Cairo Genizah, the best BoB variant (viz. Chamfer) achieves Hit@1 of 0.78 and MRR of 0.84, compared to 0.74 and 0.80, respectively, for the strongest BoW baseline (BoW-RawPatches-$χ^2$), a 6.1% relative improvement in top-1 accuracy. We furthermore study a mass-weighted BoB-OT variant that incorporates cluster population into prototype matching and present a formal approximation guarantee bounding its deviation from full component-level optimal transport. A two-stage pipeline using a BoW shortlist followed by BoB-OT reranking provides a practical compromise between retrieval strength and computational cost, supporting applicability to larger manuscript collections. The code and dataset are available at https://github.com/TAU-CH/midrash_bob.
title Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2604.08138