Saved in:
| Main Authors: | Tschirschwitz, David, Rodehorst, Volker |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.27197 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations
by: Tschirschwitz, David, et al.
Published: (2024)
by: Tschirschwitz, David, et al.
Published: (2024)
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry
by: Tschirschwitz, David, et al.
Published: (2025)
by: Tschirschwitz, David, et al.
Published: (2025)
Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval
by: Florek, Morris, et al.
Published: (2024)
by: Florek, Morris, et al.
Published: (2024)
ENSTRECT: A Stage-based Approach to 2.5D Structural Damage Detection
by: Benz, Christian, et al.
Published: (2024)
by: Benz, Christian, et al.
Published: (2024)
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification
by: Zhong, Lanfeng, et al.
Published: (2024)
by: Zhong, Lanfeng, et al.
Published: (2024)
Consensus and Subjectivity of Skin Tone Annotation for ML Fairness
by: Schumann, Candice, et al.
Published: (2023)
by: Schumann, Candice, et al.
Published: (2023)
Learning Annotation Consensus for Continuous Emotion Recognition
by: Shoer, Ibrahim, et al.
Published: (2025)
by: Shoer, Ibrahim, et al.
Published: (2025)
Efficient Inter-Task Attention for Multitask Transformer Models
by: Bohn, Christian, et al.
Published: (2025)
by: Bohn, Christian, et al.
Published: (2025)
Micro-Expression-Aware Avatar Fingerprinting via Inter-Frame Feature Differencing
by: Chapariniya, Masoumeh, et al.
Published: (2026)
by: Chapariniya, Masoumeh, et al.
Published: (2026)
SF20K Competition 2025: Summary and findings
by: Ghermi, Ridouane, et al.
Published: (2026)
by: Ghermi, Ridouane, et al.
Published: (2026)
Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations
by: Zhang, Xu, et al.
Published: (2023)
by: Zhang, Xu, et al.
Published: (2023)
Accelerating Globally Optimal Consensus Maximization in Geometric Vision
by: Zhang, Xinyue, et al.
Published: (2023)
by: Zhang, Xinyue, et al.
Published: (2023)
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
by: Zhang, Yulong, et al.
Published: (2025)
by: Zhang, Yulong, et al.
Published: (2025)
Seeing Beyond Redundancy: Task Complexity's Role in Vision Token Specialization in VLLMs
by: Hannan, Darryl, et al.
Published: (2026)
by: Hannan, Darryl, et al.
Published: (2026)
OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks
by: Fu, Ronghao, et al.
Published: (2026)
by: Fu, Ronghao, et al.
Published: (2026)
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
by: Yang, Yue, et al.
Published: (2024)
by: Yang, Yue, et al.
Published: (2024)
RSRWKV: A Linear-Complexity 2D Attention Mechanism for Efficient Remote Sensing Vision Task
by: Li, Chunshan, et al.
Published: (2025)
by: Li, Chunshan, et al.
Published: (2025)
On Convolutional Vision Transformers for Yield Prediction
by: Inderka, Alvin, et al.
Published: (2024)
by: Inderka, Alvin, et al.
Published: (2024)
Vision-Language Models for Vision Tasks: A Survey
by: Zhang, Jingyi, et al.
Published: (2023)
by: Zhang, Jingyi, et al.
Published: (2023)
Annotation-Efficient Task Guidance for Medical Segment Anything
by: Ward, Tyler, et al.
Published: (2024)
by: Ward, Tyler, et al.
Published: (2024)
Longitudinal Vestibular Schwannoma Dataset with Consensus-based Human-in-the-loop Annotations
by: Wijethilake, Navodini, et al.
Published: (2025)
by: Wijethilake, Navodini, et al.
Published: (2025)
A Vision-Centric Approach for Static Map Element Annotation
by: Zhang, Jiaxin, et al.
Published: (2023)
by: Zhang, Jiaxin, et al.
Published: (2023)
StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems
by: Ye, Jinhui, et al.
Published: (2026)
by: Ye, Jinhui, et al.
Published: (2026)
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
by: Yu, Hong-Tao, et al.
Published: (2025)
by: Yu, Hong-Tao, et al.
Published: (2025)
Annotation Free Semantic Segmentation with Vision Foundation Models
by: Seifi, Soroush, et al.
Published: (2024)
by: Seifi, Soroush, et al.
Published: (2024)
Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions
by: Zhang, Jingdong, et al.
Published: (2024)
by: Zhang, Jingdong, et al.
Published: (2024)
Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms
by: Märtens, Marcus, et al.
Published: (2024)
by: Märtens, Marcus, et al.
Published: (2024)
Road Rage Reasoning with Vision-language Models (VLMs): Task Definition and Evaluation Dataset
by: Weng, Yibing, et al.
Published: (2025)
by: Weng, Yibing, et al.
Published: (2025)
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
by: Wang, Dianyi, et al.
Published: (2025)
by: Wang, Dianyi, et al.
Published: (2025)
MEJO: MLLM-Engaged Surgical Triplet Recognition via Inter- and Intra-Task Joint Optimization
by: Zhang, Yiyi, et al.
Published: (2025)
by: Zhang, Yiyi, et al.
Published: (2025)
Leveraging Vision-Language Models as Weak Annotators in Active Learning
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)
Task-Guided Multi-Annotation Triplet Learning for Remote Sensing Representations
by: Zhou, Meilun, et al.
Published: (2026)
by: Zhou, Meilun, et al.
Published: (2026)
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
by: Liang, Han, et al.
Published: (2023)
by: Liang, Han, et al.
Published: (2023)
Debiased Prompt Tuning in Vision-Language Model without Annotations
by: Jiang, Chaoquan, et al.
Published: (2025)
by: Jiang, Chaoquan, et al.
Published: (2025)
Effortless Vision-Language Model Specialization in Histopathology without Annotation
by: Qiu, Jingna, et al.
Published: (2025)
by: Qiu, Jingna, et al.
Published: (2025)
The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models
by: Niss, Laura, et al.
Published: (2024)
by: Niss, Laura, et al.
Published: (2024)
CAMAv2: A Vision-Centric Approach for Static Map Element Annotation
by: Chen, Shiyuan, et al.
Published: (2024)
by: Chen, Shiyuan, et al.
Published: (2024)
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
by: Ashraf, Tajamul, et al.
Published: (2025)
by: Ashraf, Tajamul, et al.
Published: (2025)
ConsensusDrop: Fusing Visual and Cross-Modal Saliency for Efficient Vision Language Models
by: Parikh, Dhruv, et al.
Published: (2026)
by: Parikh, Dhruv, et al.
Published: (2026)
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
by: Chu, Xiangxiang, et al.
Published: (2024)
by: Chu, Xiangxiang, et al.
Published: (2024)
Similar Items
-
Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations
by: Tschirschwitz, David, et al.
Published: (2024) -
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry
by: Tschirschwitz, David, et al.
Published: (2025) -
Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval
by: Florek, Morris, et al.
Published: (2024) -
ENSTRECT: A Stage-based Approach to 2.5D Structural Damage Detection
by: Benz, Christian, et al.
Published: (2024) -
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification
by: Zhong, Lanfeng, et al.
Published: (2024)