:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Maurya, Amritansh, Singh, Navjot, Javed, Mohammed, Moured, Omar
Format:	Preprint
Published:	2026
Subjects:	Information Retrieval Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2605.20254
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RAPTOR: Refined Approach for Product Table Object Recognition
by: Thomas, Eliott, et al.
Published: (2025)

EfficientPosterGen: Semantic-aware Efficient Poster Generation via Token Compression and Accurate Violation Detection
by: Tang, Wenxin, et al.
Published: (2026)

Beyond String Matching: Semantic Evaluation of PDF Table Extraction
by: Horn, Pius, et al.
Published: (2026)

Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)

TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
by: Mannam, Varun, et al.
Published: (2025)

PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
by: Zou, Qiang, et al.
Published: (2025)

Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries
by: Mezzi, Emanuele, et al.
Published: (2025)

Efficient Logic Gate Networks for Video Copy Detection
by: Fojcik, Katarzyna
Published: (2026)

OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
by: Yang, Wei, et al.
Published: (2025)

PHPQ: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval
by: Zeng, Ziyun, et al.
Published: (2021)

Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning
by: Zhang, Jinxu
Published: (2024)

A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy
by: Ghatwary, Noha, et al.
Published: (2026)

HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)

Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems
by: Zhang, Tuo, et al.
Published: (2025)

Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)
by: Duan, Yicheng, et al.
Published: (2025)

Scale Up Composed Image Retrieval Learning via Modification Text Generation
by: Zhou, Yinan, et al.
Published: (2025)

VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
by: Giahi, Ramin, et al.
Published: (2025)

Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
by: Thanh, Toan Le Ngo, et al.
Published: (2025)

Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
by: Zhan, Jingtao, et al.
Published: (2024)

Very Efficient Listwise Multimodal Reranking for Long Documents
by: Sun, Yiqun, et al.
Published: (2026)

Image and Data Mining in Reticular Chemistry Using GPT-4V
by: Zheng, Zhiling, et al.
Published: (2023)

GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
by: Wang, Yuting, et al.
Published: (2023)

Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
by: Wang, Mengru, et al.
Published: (2025)

MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)

Benchmark Granularity and Model Robustness for Image-Text Retrieval
by: Hendriksen, Mariya, et al.
Published: (2024)

Smart Routing for Multimodal Video Retrieval: When to Search What
by: Rosa, Kevin Dela
Published: (2025)

A Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrieval
by: Lim, Ho Hung, et al.
Published: (2026)

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
by: Cui, Cheng, et al.
Published: (2026)

Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)

Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval
by: Wang, Jinpeng, et al.
Published: (2024)

Low-Data Classification of Historical Music Manuscripts: A Few-Shot Learning Approach
by: Shatri, Elona, et al.
Published: (2024)

Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs
by: Horn, Pius, et al.
Published: (2025)

From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline
by: Shoilee, Sarah Binta Alam, et al.
Published: (2026)

Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search
by: Chen, Lei, et al.
Published: (2026)

Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
by: Rosa, Kevin Dela
Published: (2024)

Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings
by: Luo, Enming, et al.
Published: (2024)

Accelerating Flood Warnings by 10 Hours: The Power of River Network Topology in AI-enhanced Flood Forecasting
by: Wang, Hongjun, et al.
Published: (2024)

Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging
by: Jush, Farnaz Khun, et al.
Published: (2025)

Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval
by: Long, Zijun, et al.
Published: (2025)

Your Embedding Model is SMARTer Than You Think
by: Zhang, Jianrui, et al.
Published: (2026)