Saved in:
| Main Authors: | Śanchez, Èric, Molina, Adrià, Terrades, Oriol Ramos |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.03911 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval
by: Molina, Adrià, et al.
Published: (2024)
by: Molina, Adrià, et al.
Published: (2024)
The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing
by: Rodríguez, Adrià Molina, et al.
Published: (2025)
by: Rodríguez, Adrià Molina, et al.
Published: (2025)
Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers
by: Méndez, Martín, et al.
Published: (2024)
by: Méndez, Martín, et al.
Published: (2024)
Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval
by: Molina, Adrià, et al.
Published: (2026)
by: Molina, Adrià, et al.
Published: (2026)
An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps
by: Liu, Ziyi, et al.
Published: (2024)
by: Liu, Ziyi, et al.
Published: (2024)
Recurrent Few-Shot model for Document Verification
by: Talarmain, Maxime, et al.
Published: (2024)
by: Talarmain, Maxime, et al.
Published: (2024)
ISS-Geo142: A Benchmark for Geolocating Astronaut Photography from the International Space Station
by: Srivastava, Vedika, et al.
Published: (2025)
by: Srivastava, Vedika, et al.
Published: (2025)
Deep Pulse-Signal Magnification for remote Heart Rate Estimation in Compressed Videos
by: Comas, Joaquim, et al.
Published: (2024)
by: Comas, Joaquim, et al.
Published: (2024)
PhotoBot: Reference-Guided Interactive Photography via Natural Language
by: Limoyo, Oliver, et al.
Published: (2024)
by: Limoyo, Oliver, et al.
Published: (2024)
Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning
by: Tadesse, Girmaw Abebe, et al.
Published: (2024)
by: Tadesse, Girmaw Abebe, et al.
Published: (2024)
Before the Shutter: Aesthetic and Actionable Portrait Photography Planning in 3D Scenes
by: Jiang, Ruixiang, et al.
Published: (2026)
by: Jiang, Ruixiang, et al.
Published: (2026)
PhotoFlow: Agentic 3D Virtual Photography Missions
by: Guo, Jiarui, et al.
Published: (2026)
by: Guo, Jiarui, et al.
Published: (2026)
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification
by: Bakkali, Souhail, et al.
Published: (2023)
by: Bakkali, Souhail, et al.
Published: (2023)
Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation
by: Zhang, Ling, et al.
Published: (2025)
by: Zhang, Ling, et al.
Published: (2025)
TerraQ: Spatiotemporal Question-Answering on Satellite Image Archives
by: Kefalidis, Sergios-Anestis, et al.
Published: (2025)
by: Kefalidis, Sergios-Anestis, et al.
Published: (2025)
Signature Forgery Detection: Improving Cross-Dataset Generalization
by: Parracho, Matheus Ramos
Published: (2025)
by: Parracho, Matheus Ramos
Published: (2025)
Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance
by: Zhang, Weiyi, et al.
Published: (2024)
by: Zhang, Weiyi, et al.
Published: (2024)
Synthetic dataset of ID and Travel Document
by: Boned, Carlos, et al.
Published: (2024)
by: Boned, Carlos, et al.
Published: (2024)
An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers
by: Santos, Felipe Carlos dos, et al.
Published: (2025)
by: Santos, Felipe Carlos dos, et al.
Published: (2025)
Enhancing Vectorized Map Perception with Historical Rasterized Maps
by: Zhang, Xiaoyu, et al.
Published: (2024)
by: Zhang, Xiaoyu, et al.
Published: (2024)
Data Set Terminology of Deep Learning in Medicine: A Historical Review and Recommendation
by: Walston, Shannon L., et al.
Published: (2024)
by: Walston, Shannon L., et al.
Published: (2024)
Exploring Social Media Image Categorization Using Large Models with Different Adaptation Methods: A Case Study on Cultural Nature's Contributions to People
by: Khaldi, Rohaifa, et al.
Published: (2024)
by: Khaldi, Rohaifa, et al.
Published: (2024)
NeRF-Insert: 3D Local Editing with Multimodal Control Signals
by: Sabat, Benet Oriol, et al.
Published: (2024)
by: Sabat, Benet Oriol, et al.
Published: (2024)
On the Role of Domain Experts in Creating Effective Tutoring Systems
by: Sreedharan, Sarath, et al.
Published: (2025)
by: Sreedharan, Sarath, et al.
Published: (2025)
A Case Study on Concept Induction for Neuron-Level Interpretability in CNN
by: Sarma, Moumita Sen, et al.
Published: (2026)
by: Sarma, Moumita Sen, et al.
Published: (2026)
Enhancing Community Vision Screening -- AI Driven Retinal Photography for Early Disease Detection and Patient Trust
by: Lei, Xiaofeng, et al.
Published: (2024)
by: Lei, Xiaofeng, et al.
Published: (2024)
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
by: Zheng, Kaizhi, et al.
Published: (2023)
by: Zheng, Kaizhi, et al.
Published: (2023)
VGTS: Visually Guided Text Spotting for Novel Categories in Historical Manuscripts
by: Hu, Wenbo, et al.
Published: (2023)
by: Hu, Wenbo, et al.
Published: (2023)
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025)
by: Shams, Montasir, et al.
Published: (2025)
ArtFace: Towards Historical Portrait Face Identification via Model Adaptation
by: Poh, Francois, et al.
Published: (2025)
by: Poh, Francois, et al.
Published: (2025)
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
by: Pikabea, Iñigo, et al.
Published: (2025)
by: Pikabea, Iñigo, et al.
Published: (2025)
GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling
by: Bansal, Hritik, et al.
Published: (2024)
by: Bansal, Hritik, et al.
Published: (2024)
Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
by: Hou, Zhaoyi Joey, et al.
Published: (2025)
by: Hou, Zhaoyi Joey, et al.
Published: (2025)
Facial Landmark Visualization and Emotion Recognition Through Neural Networks
by: Juárez-Jiménez, Israel, et al.
Published: (2025)
by: Juárez-Jiménez, Israel, et al.
Published: (2025)
Optical Music Recognition in Manuscripts from the Ricordi Archive
by: Simonetta, Federico, et al.
Published: (2024)
by: Simonetta, Federico, et al.
Published: (2024)
Neural Restoration of Greening Defects in Historical Autochrome Photographs Based on Purely Synthetic Data
by: Sinha, Saptarshi Neil, et al.
Published: (2025)
by: Sinha, Saptarshi Neil, et al.
Published: (2025)
Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
by: Chodavarapu, Ranjith
Published: (2026)
by: Chodavarapu, Ranjith
Published: (2026)
Can Vision-Language Models Replace Human Annotators: A Case Study with CelebA Dataset
by: Lu, Haoming, et al.
Published: (2024)
by: Lu, Haoming, et al.
Published: (2024)
Is Architectural Complexity Always the Answer? A Case Study on SwinIR vs. an Efficient CNN
by: Sutariya, Chandresh, et al.
Published: (2025)
by: Sutariya, Chandresh, et al.
Published: (2025)
Inter-Class Relational Loss for Small Object Detection: A Case Study on License Plates
by: Ning, Dian, et al.
Published: (2025)
by: Ning, Dian, et al.
Published: (2025)
Similar Items
-
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval
by: Molina, Adrià, et al.
Published: (2024) -
The OCR Quest for Generalization: Learning to recognize low-resource alphabets with model editing
by: Rodríguez, Adrià Molina, et al.
Published: (2025) -
Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers
by: Méndez, Martín, et al.
Published: (2024) -
Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval
by: Molina, Adrià, et al.
Published: (2026) -
An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps
by: Liu, Ziyi, et al.
Published: (2024)