:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Oliveira, Daniel A. P., de Matos, David Martins
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition I.2; I.4; I.5; I.7
Online Access:	https://arxiv.org/abs/2507.07340
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Transfer-learning for video classification: Video Swin Transformer on multiple domains
by: Oliveira, Daniel A. P., et al.
Published: (2022)

StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)

Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025)

Trapped in texture bias? A large scale comparison of deep instance segmentation
by: Theodoridis, Johannes, et al.
Published: (2024)

Cytoarchitecture in Words: Weakly Supervised Vision-Language Modeling for Human Brain Microscopy
by: Sutton, Matthew, et al.
Published: (2026)

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis
by: Heyne, Catyana, et al.
Published: (2026)

Sign language recognition based on deep learning and low-cost handcrafted descriptors
by: Carneiro, Alvaro Leandro Cavalcante, et al.
Published: (2024)

Learning from Semantic Dictionaries: Discriminative Codebook Contrastive Learning for Unified Visual Representation and Generation
by: Estepa, Imanol G., et al.
Published: (2026)

GroundCap: A Visually Grounded Image Captioning Dataset
by: Oliveira, Daniel A. P., et al.
Published: (2025)

Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
by: Oliveira, Daniel A. P., et al.
Published: (2024)

Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
by: Semenov, Andrei, et al.
Published: (2024)

InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2025)

StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
by: Oliveira, Daniel A. P., et al.
Published: (2025)

GAEA: A Geolocation Aware Conversational Assistant
by: Campos, Ron, et al.
Published: (2025)

Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
by: Li, Jianing, et al.
Published: (2024)

Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model
by: Li, Kai, et al.
Published: (2023)

All4One: Symbiotic Neighbour Contrastive Learning via Self-Attention and Redundancy Reduction
by: Estepa, Imanol G., et al.
Published: (2023)

MRD: Using Physically Based Differentiable Rendering to Probe Vision Models for 3D Scene Understanding
by: Beilharz, Benjamin, et al.
Published: (2025)

WaveMix: A Resource-efficient Neural Network for Image Analysis
by: Jeevan, Pranav, et al.
Published: (2022)

Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
by: Dalal, Anurag, et al.
Published: (2024)

CCVA-FL: Cross-Client Variations Adaptive Federated Learning for Medical Imaging
by: Gupta, Sunny, et al.
Published: (2024)

Large Language Models for Simultaneous Named Entity Extraction and Spelling Correction
by: Whittaker, Edward, et al.
Published: (2024)

Vision transformers in domain adaptation and domain generalization: a study of robustness
by: Alijani, Shadi, et al.
Published: (2024)

SYNOSIS: Image synthesis pipeline for machine vision in metal surface inspection
by: Fulir, Juraj, et al.
Published: (2024)

Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision
by: Jeevan, Pranav, et al.
Published: (2024)

MANGO: Learning Disentangled Image Transformation Manifolds with Grouped Operators
by: Ancelin, Brighton, et al.
Published: (2024)

FLD+: Data-efficient Evaluation Metric for Generative Models
by: Jeevan, Pranav, et al.
Published: (2024)

WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency
by: Jeevan, Pranav, et al.
Published: (2024)

Normalizing Flow-Based Metric for Image Generation
by: Jeevan, Pranav, et al.
Published: (2024)

Devanagari Handwritten Character Recognition using Convolutional Neural Network
by: Mehta, Diksha, et al.
Published: (2025)

Canonical Space Representation for 4D Panoptic Segmentation of Articulated Objects
by: Gomes, Manuel, et al.
Published: (2025)

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning
by: Ma, Chong, et al.
Published: (2024)

The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
by: Pandey, Anupam, et al.
Published: (2025)

Taming the Tail: Leveraging Asymmetric Loss and Pade Approximation to Overcome Medical Image Long-Tailed Class Imbalance
by: Kashyap, Pankhi, et al.
Published: (2024)

Fine-Tuning Vision-Language Models for Understanding Current Damage and Scoring Priority with Quality Guard Agent
by: Yasuno, Takato
Published: (2026)

Exploiting Causality Signals in Medical Images: A Pilot Study with Empirical Results
by: Carloni, Gianluca, et al.
Published: (2023)

LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
by: Wei, Hualiang, et al.
Published: (2026)

Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
by: Doumbouya, Moussa Koulako Bala, et al.
Published: (2025)

Evaluation Metric for Quality Control and Generative Models in Histopathology Images
by: Jeevan, Pranav, et al.
Published: (2024)

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
by: Hansen-Estruch, Philippe, et al.
Published: (2025)