Saved in:
| Main Authors: | Maurya, Amritansh, Singh, Navjot, Javed, Mohammed, Moured, Omar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.20254 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RAPTOR: Refined Approach for Product Table Object Recognition
by: Thomas, Eliott, et al.
Published: (2025)
by: Thomas, Eliott, et al.
Published: (2025)
EfficientPosterGen: Semantic-aware Efficient Poster Generation via Token Compression and Accurate Violation Detection
by: Tang, Wenxin, et al.
Published: (2026)
by: Tang, Wenxin, et al.
Published: (2026)
Beyond String Matching: Semantic Evaluation of PDF Table Extraction
by: Horn, Pius, et al.
Published: (2026)
by: Horn, Pius, et al.
Published: (2026)
Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024)
by: Dong, Guanting, et al.
Published: (2024)
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
by: Mannam, Varun, et al.
Published: (2025)
by: Mannam, Varun, et al.
Published: (2025)
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
by: Zou, Qiang, et al.
Published: (2025)
by: Zou, Qiang, et al.
Published: (2025)
Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries
by: Mezzi, Emanuele, et al.
Published: (2025)
by: Mezzi, Emanuele, et al.
Published: (2025)
Efficient Logic Gate Networks for Video Copy Detection
by: Fojcik, Katarzyna
Published: (2026)
by: Fojcik, Katarzyna
Published: (2026)
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
by: Yang, Wei, et al.
Published: (2025)
by: Yang, Wei, et al.
Published: (2025)
PHPQ: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval
by: Zeng, Ziyun, et al.
Published: (2021)
by: Zeng, Ziyun, et al.
Published: (2021)
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning
by: Zhang, Jinxu
Published: (2024)
by: Zhang, Jinxu
Published: (2024)
A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy
by: Ghatwary, Noha, et al.
Published: (2026)
by: Ghatwary, Noha, et al.
Published: (2026)
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
by: Askari, Arian, et al.
Published: (2025)
by: Askari, Arian, et al.
Published: (2025)
Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems
by: Zhang, Tuo, et al.
Published: (2025)
by: Zhang, Tuo, et al.
Published: (2025)
Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)
by: Duan, Yicheng, et al.
Published: (2025)
by: Duan, Yicheng, et al.
Published: (2025)
Scale Up Composed Image Retrieval Learning via Modification Text Generation
by: Zhou, Yinan, et al.
Published: (2025)
by: Zhou, Yinan, et al.
Published: (2025)
VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
by: Giahi, Ramin, et al.
Published: (2025)
by: Giahi, Ramin, et al.
Published: (2025)
Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
by: Thanh, Toan Le Ngo, et al.
Published: (2025)
by: Thanh, Toan Le Ngo, et al.
Published: (2025)
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
by: Zhan, Jingtao, et al.
Published: (2024)
by: Zhan, Jingtao, et al.
Published: (2024)
Very Efficient Listwise Multimodal Reranking for Long Documents
by: Sun, Yiqun, et al.
Published: (2026)
by: Sun, Yiqun, et al.
Published: (2026)
Image and Data Mining in Reticular Chemistry Using GPT-4V
by: Zheng, Zhiling, et al.
Published: (2023)
by: Zheng, Zhiling, et al.
Published: (2023)
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
by: Wang, Yuting, et al.
Published: (2023)
by: Wang, Yuting, et al.
Published: (2023)
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
by: Wang, Mengru, et al.
Published: (2025)
by: Wang, Mengru, et al.
Published: (2025)
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
by: Ju, Yeong-Joon, et al.
Published: (2024)
by: Ju, Yeong-Joon, et al.
Published: (2024)
Benchmark Granularity and Model Robustness for Image-Text Retrieval
by: Hendriksen, Mariya, et al.
Published: (2024)
by: Hendriksen, Mariya, et al.
Published: (2024)
Smart Routing for Multimodal Video Retrieval: When to Search What
by: Rosa, Kevin Dela
Published: (2025)
by: Rosa, Kevin Dela
Published: (2025)
A Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrieval
by: Lim, Ho Hung, et al.
Published: (2026)
by: Lim, Ho Hung, et al.
Published: (2026)
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
by: Cui, Cheng, et al.
Published: (2026)
by: Cui, Cheng, et al.
Published: (2026)
Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval
by: Wang, Jinpeng, et al.
Published: (2024)
by: Wang, Jinpeng, et al.
Published: (2024)
Low-Data Classification of Historical Music Manuscripts: A Few-Shot Learning Approach
by: Shatri, Elona, et al.
Published: (2024)
by: Shatri, Elona, et al.
Published: (2024)
Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs
by: Horn, Pius, et al.
Published: (2025)
by: Horn, Pius, et al.
Published: (2025)
From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline
by: Shoilee, Sarah Binta Alam, et al.
Published: (2026)
by: Shoilee, Sarah Binta Alam, et al.
Published: (2026)
Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search
by: Chen, Lei, et al.
Published: (2026)
by: Chen, Lei, et al.
Published: (2026)
Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
by: Rosa, Kevin Dela
Published: (2024)
by: Rosa, Kevin Dela
Published: (2024)
Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings
by: Luo, Enming, et al.
Published: (2024)
by: Luo, Enming, et al.
Published: (2024)
Accelerating Flood Warnings by 10 Hours: The Power of River Network Topology in AI-enhanced Flood Forecasting
by: Wang, Hongjun, et al.
Published: (2024)
by: Wang, Hongjun, et al.
Published: (2024)
Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging
by: Jush, Farnaz Khun, et al.
Published: (2025)
by: Jush, Farnaz Khun, et al.
Published: (2025)
Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval
by: Long, Zijun, et al.
Published: (2025)
by: Long, Zijun, et al.
Published: (2025)
Your Embedding Model is SMARTer Than You Think
by: Zhang, Jianrui, et al.
Published: (2026)
by: Zhang, Jianrui, et al.
Published: (2026)
Similar Items
-
RAPTOR: Refined Approach for Product Table Object Recognition
by: Thomas, Eliott, et al.
Published: (2025) -
EfficientPosterGen: Semantic-aware Efficient Poster Generation via Token Compression and Accurate Violation Detection
by: Tang, Wenxin, et al.
Published: (2026) -
Beyond String Matching: Semantic Evaluation of PDF Table Extraction
by: Horn, Pius, et al.
Published: (2026) -
Progressive Multimodal Reasoning via Active Retrieval
by: Dong, Guanting, et al.
Published: (2024) -
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
by: Mannam, Varun, et al.
Published: (2025)