Guardado en:
| Autores principales: | Hamdi, Laziz, Tamasna, Amine, Paquet, Thierry |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2604.16099 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers
por: Hamdi, Laziz, et al.
Publicado: (2026)
por: Hamdi, Laziz, et al.
Publicado: (2026)
TableSeq: Unified Generation of Structure, Content, and Layout
por: Hamdi, Laziz, et al.
Publicado: (2026)
por: Hamdi, Laziz, et al.
Publicado: (2026)
PILOT: A Promptable Interleaved Layout-aware OCR Transformer
por: Hamdi, Laziz, et al.
Publicado: (2025)
por: Hamdi, Laziz, et al.
Publicado: (2025)
MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition
por: Kassab, Hozaifa, et al.
Publicado: (2024)
por: Kassab, Hozaifa, et al.
Publicado: (2024)
Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition
por: Simon, Tom, et al.
Publicado: (2025)
por: Simon, Tom, et al.
Publicado: (2025)
End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music
por: Ríos-Vila, Antonio, et al.
Publicado: (2024)
por: Ríos-Vila, Antonio, et al.
Publicado: (2024)
Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription
por: Ríos-Vila, Antonio, et al.
Publicado: (2024)
por: Ríos-Vila, Antonio, et al.
Publicado: (2024)
DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation
por: Anand, Tushar, et al.
Publicado: (2026)
por: Anand, Tushar, et al.
Publicado: (2026)
RoadscapesQA: A Multitask, Multimodal Dataset for Visual Question Answering on Indian Roads
por: Iyer, Vijayasri, et al.
Publicado: (2026)
por: Iyer, Vijayasri, et al.
Publicado: (2026)
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
por: Li, Yuyi, et al.
Publicado: (2025)
por: Li, Yuyi, et al.
Publicado: (2025)
DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition
por: Hambarde, Kailash A., et al.
Publicado: (2025)
por: Hambarde, Kailash A., et al.
Publicado: (2025)
Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
por: Qiu, Jielin, et al.
Publicado: (2024)
por: Qiu, Jielin, et al.
Publicado: (2024)
STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes
por: Ishihara, Keishi, et al.
Publicado: (2025)
por: Ishihara, Keishi, et al.
Publicado: (2025)
Data-Augmented Multimodal Feature Fusion for Multiclass Visual Recognition of Oral Cancer Lesions
por: Naoum, Joy, et al.
Publicado: (2025)
por: Naoum, Joy, et al.
Publicado: (2025)
GCF: Graph Convolutional Networks for Facial Expression Recognition
por: Kassab, Hozaifa, et al.
Publicado: (2024)
por: Kassab, Hozaifa, et al.
Publicado: (2024)
TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements
por: Trivedi, Abhishek, et al.
Publicado: (2024)
por: Trivedi, Abhishek, et al.
Publicado: (2024)
End-to-end information extraction in handwritten documents: Understanding Paris marriage records from 1880 to 1940
por: Constum, Thomas, et al.
Publicado: (2024)
por: Constum, Thomas, et al.
Publicado: (2024)
Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation
por: Zayene, Mehdi, et al.
Publicado: (2024)
por: Zayene, Mehdi, et al.
Publicado: (2024)
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
por: Lompo, Boammani Aser, et al.
Publicado: (2025)
por: Lompo, Boammani Aser, et al.
Publicado: (2025)
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
por: Zhao, Weichao, et al.
Publicado: (2024)
por: Zhao, Weichao, et al.
Publicado: (2024)
ICTPolarReal: A Polarized Reflection and Material Dataset of Real World Objects
por: Yang, Jing, et al.
Publicado: (2026)
por: Yang, Jing, et al.
Publicado: (2026)
Few-shot Writer Adaptation via Multimodal In-Context Learning
por: Simon, Tom, et al.
Publicado: (2026)
por: Simon, Tom, et al.
Publicado: (2026)
A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR
por: Agbeti-messan, Merveilles, et al.
Publicado: (2026)
por: Agbeti-messan, Merveilles, et al.
Publicado: (2026)
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition
por: Zhang, Zhenrong, et al.
Publicado: (2024)
por: Zhang, Zhenrong, et al.
Publicado: (2024)
Dynamically Modulating Visual Place Recognition Sequence Length For Minimum Acceptable Performance Scenarios
por: Malone, Connor, et al.
Publicado: (2024)
por: Malone, Connor, et al.
Publicado: (2024)
VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning
por: Ji, Yuheng, et al.
Publicado: (2025)
por: Ji, Yuheng, et al.
Publicado: (2025)
Revisiting Transformers with Insights from Image Filtering and Boosting
por: Abdullaev, Laziz U., et al.
Publicado: (2025)
por: Abdullaev, Laziz U., et al.
Publicado: (2025)
Dual-Imbalance Continual Learning for Real-World Food Recognition
por: Zhang, Xiaoyan, et al.
Publicado: (2026)
por: Zhang, Xiaoyan, et al.
Publicado: (2026)
Real-World Transferable Adversarial Attack on Face-Recognition Systems
por: Kaznacheev, Andrey, et al.
Publicado: (2025)
por: Kaznacheev, Andrey, et al.
Publicado: (2025)
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
por: Zhang, Yuanhan, et al.
Publicado: (2024)
por: Zhang, Yuanhan, et al.
Publicado: (2024)
Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances
por: Reddy, Arun V., et al.
Publicado: (2023)
por: Reddy, Arun V., et al.
Publicado: (2023)
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
por: Salehi, Mohammadreza, et al.
Publicado: (2024)
por: Salehi, Mohammadreza, et al.
Publicado: (2024)
ISLES'24 -- A Real-World Longitudinal Multimodal Stroke Dataset
por: Riedel, Evamaria Olga, et al.
Publicado: (2024)
por: Riedel, Evamaria Olga, et al.
Publicado: (2024)
Dens3R: A Foundation Model for 3D Geometry Prediction
por: Fang, Xianze, et al.
Publicado: (2025)
por: Fang, Xianze, et al.
Publicado: (2025)
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
por: Zaffar, Mubariz, et al.
Publicado: (2024)
por: Zaffar, Mubariz, et al.
Publicado: (2024)
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
por: Wang, Bin, et al.
Publicado: (2024)
por: Wang, Bin, et al.
Publicado: (2024)
Massively Annotated Datasets for Assessment of Synthetic and Real Data in Face Recognition
por: Neto, Pedro C., et al.
Publicado: (2024)
por: Neto, Pedro C., et al.
Publicado: (2024)
WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
por: Jiang, Tianjian, et al.
Publicado: (2025)
por: Jiang, Tianjian, et al.
Publicado: (2025)
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
por: Wang, Chengjie, et al.
Publicado: (2024)
por: Wang, Chengjie, et al.
Publicado: (2024)
UDC-VIT: A Real-World Video Dataset for Under-Display Cameras
por: Ahn, Kyusu, et al.
Publicado: (2025)
por: Ahn, Kyusu, et al.
Publicado: (2025)
Ejemplares similares
-
FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers
por: Hamdi, Laziz, et al.
Publicado: (2026) -
TableSeq: Unified Generation of Structure, Content, and Layout
por: Hamdi, Laziz, et al.
Publicado: (2026) -
PILOT: A Promptable Interleaved Layout-aware OCR Transformer
por: Hamdi, Laziz, et al.
Publicado: (2025) -
MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition
por: Kassab, Hozaifa, et al.
Publicado: (2024) -
Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition
por: Simon, Tom, et al.
Publicado: (2025)