:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Śanchez, Èric, Molina, Adrià, Terrades, Oriol Ramos
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.03911
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval
by: Molina, Adrià, et al.
Published: (2024)

The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing
by: Rodríguez, Adrià Molina, et al.
Published: (2025)

Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers
by: Méndez, Martín, et al.
Published: (2024)

Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval
by: Molina, Adrià, et al.
Published: (2026)

An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps
by: Liu, Ziyi, et al.
Published: (2024)

Recurrent Few-Shot model for Document Verification
by: Talarmain, Maxime, et al.
Published: (2024)

ISS-Geo142: A Benchmark for Geolocating Astronaut Photography from the International Space Station
by: Srivastava, Vedika, et al.
Published: (2025)

Deep Pulse-Signal Magnification for remote Heart Rate Estimation in Compressed Videos
by: Comas, Joaquim, et al.
Published: (2024)

PhotoBot: Reference-Guided Interactive Photography via Natural Language
by: Limoyo, Oliver, et al.
Published: (2024)

Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning
by: Tadesse, Girmaw Abebe, et al.
Published: (2024)

Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes
by: Jiang, Ruixiang, et al.
Published: (2026)

PhotoFlow: Agentic 3D Virtual Photography Missions
by: Guo, Jiarui, et al.
Published: (2026)

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification
by: Bakkali, Souhail, et al.
Published: (2023)

Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation
by: Zhang, Ling, et al.
Published: (2025)

TerraQ: Spatiotemporal Question-Answering on Satellite Image Archives
by: Kefalidis, Sergios-Anestis, et al.
Published: (2025)

Signature Forgery Detection: Improving Cross-Dataset Generalization
by: Parracho, Matheus Ramos
Published: (2025)

Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance
by: Zhang, Weiyi, et al.
Published: (2024)

Synthetic dataset of ID and Travel Document
by: Boned, Carlos, et al.
Published: (2024)

An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers
by: Santos, Felipe Carlos dos, et al.
Published: (2025)

Enhancing Vectorized Map Perception with Historical Rasterized Maps
by: Zhang, Xiaoyu, et al.
Published: (2024)

Data Set Terminology of Deep Learning in Medicine: A Historical Review and Recommendation
by: Walston, Shannon L., et al.
Published: (2024)

Exploring Social Media Image Categorization Using Large Models with Different Adaptation Methods: A Case Study on Cultural Nature's Contributions to People
by: Khaldi, Rohaifa, et al.
Published: (2024)

NeRF-Insert: 3D Local Editing with Multimodal Control Signals
by: Sabat, Benet Oriol, et al.
Published: (2024)

On the Role of Domain Experts in Creating Effective Tutoring Systems
by: Sreedharan, Sarath, et al.
Published: (2025)

A Case Study on Concept Induction for Neuron-Level Interpretability in CNN
by: Sarma, Moumita Sen, et al.
Published: (2026)

Enhancing Community Vision Screening -- AI Driven Retinal Photography for Early Disease Detection and Patient Trust
by: Lei, Xiaofeng, et al.
Published: (2024)

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
by: Zheng, Kaizhi, et al.
Published: (2023)

VGTS: Visually Guided Text Spotting for Novel Categories in Historical Manuscripts
by: Hu, Wenbo, et al.
Published: (2023)

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025)

ArtFace: Towards Historical Portrait Face Identification via Model Adaptation
by: Poh, Francois, et al.
Published: (2025)

Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
by: Pikabea, Iñigo, et al.
Published: (2025)

GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling
by: Bansal, Hritik, et al.
Published: (2024)

Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
by: Hou, Zhaoyi Joey, et al.
Published: (2025)

Facial Landmark Visualization and Emotion Recognition Through Neural Networks
by: Juárez-Jiménez, Israel, et al.
Published: (2025)

Optical Music Recognition in Manuscripts from the Ricordi Archive
by: Simonetta, Federico, et al.
Published: (2024)

Neural Restoration of Greening Defects in Historical Autochrome Photographs Based on Purely Synthetic Data
by: Sinha, Saptarshi Neil, et al.
Published: (2025)

Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
by: Chodavarapu, Ranjith
Published: (2026)

Can Vision-Language Models Replace Human Annotators: A Case Study with CelebA Dataset
by: Lu, Haoming, et al.
Published: (2024)

Is Architectural Complexity Always the Answer? A Case Study on SwinIR vs. an Efficient CNN
by: Sutariya, Chandresh, et al.
Published: (2025)

Inter-Class Relational Loss for Small Object Detection: A Case Study on License Plates
by: Ning, Dian, et al.
Published: (2025)