Saved in:
| Main Authors: | Oliveira, Daniel A. P., de Matos, David Martins |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.07340 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Transfer-learning for video classification: Video Swin Transformer on multiple domains
by: Oliveira, Daniel A. P., et al.
Published: (2022)
by: Oliveira, Daniel A. P., et al.
Published: (2022)
StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)
by: Oliveira, Daniel, et al.
Published: (2026)
Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025)
by: Ji, Binbin, et al.
Published: (2025)
Trapped in texture bias? A large scale comparison of deep instance segmentation
by: Theodoridis, Johannes, et al.
Published: (2024)
by: Theodoridis, Johannes, et al.
Published: (2024)
Cytoarchitecture in Words: Weakly Supervised Vision-Language Modeling for Human Brain Microscopy
by: Sutton, Matthew, et al.
Published: (2026)
by: Sutton, Matthew, et al.
Published: (2026)
Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis
by: Heyne, Catyana, et al.
Published: (2026)
by: Heyne, Catyana, et al.
Published: (2026)
Sign language recognition based on deep learning and low-cost handcrafted descriptors
by: Carneiro, Alvaro Leandro Cavalcante, et al.
Published: (2024)
by: Carneiro, Alvaro Leandro Cavalcante, et al.
Published: (2024)
Learning from Semantic Dictionaries: Discriminative Codebook Contrastive Learning for Unified Visual Representation and Generation
by: Estepa, Imanol G., et al.
Published: (2026)
by: Estepa, Imanol G., et al.
Published: (2026)
GroundCap: A Visually Grounded Image Captioning Dataset
by: Oliveira, Daniel A. P., et al.
Published: (2025)
by: Oliveira, Daniel A. P., et al.
Published: (2025)
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
by: Oliveira, Daniel A. P., et al.
Published: (2024)
by: Oliveira, Daniel A. P., et al.
Published: (2024)
Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning
by: Semenov, Andrei, et al.
Published: (2024)
by: Semenov, Andrei, et al.
Published: (2024)
InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2025)
by: Iyengar, Anirudh Iyengar Kaniyar Narayana, et al.
Published: (2025)
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
by: Oliveira, Daniel A. P., et al.
Published: (2025)
by: Oliveira, Daniel A. P., et al.
Published: (2025)
GAEA: A Geolocation Aware Conversational Assistant
by: Campos, Ron, et al.
Published: (2025)
by: Campos, Ron, et al.
Published: (2025)
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
by: Li, Jianing, et al.
Published: (2024)
by: Li, Jianing, et al.
Published: (2024)
Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model
by: Li, Kai, et al.
Published: (2023)
by: Li, Kai, et al.
Published: (2023)
All4One: Symbiotic Neighbour Contrastive Learning via Self-Attention and Redundancy Reduction
by: Estepa, Imanol G., et al.
Published: (2023)
by: Estepa, Imanol G., et al.
Published: (2023)
MRD: Using Physically Based Differentiable Rendering to Probe Vision Models for 3D Scene Understanding
by: Beilharz, Benjamin, et al.
Published: (2025)
by: Beilharz, Benjamin, et al.
Published: (2025)
WaveMix: A Resource-efficient Neural Network for Image Analysis
by: Jeevan, Pranav, et al.
Published: (2022)
by: Jeevan, Pranav, et al.
Published: (2022)
Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review
by: Dalal, Anurag, et al.
Published: (2024)
by: Dalal, Anurag, et al.
Published: (2024)
CCVA-FL: Cross-Client Variations Adaptive Federated Learning for Medical Imaging
by: Gupta, Sunny, et al.
Published: (2024)
by: Gupta, Sunny, et al.
Published: (2024)
Large Language Models for Simultaneous Named Entity Extraction and Spelling Correction
by: Whittaker, Edward, et al.
Published: (2024)
by: Whittaker, Edward, et al.
Published: (2024)
Vision transformers in domain adaptation and domain generalization: a study of robustness
by: Alijani, Shadi, et al.
Published: (2024)
by: Alijani, Shadi, et al.
Published: (2024)
SYNOSIS: Image synthesis pipeline for machine vision in metal surface inspection
by: Fulir, Juraj, et al.
Published: (2024)
by: Fulir, Juraj, et al.
Published: (2024)
Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision
by: Jeevan, Pranav, et al.
Published: (2024)
by: Jeevan, Pranav, et al.
Published: (2024)
MANGO: Learning Disentangled Image Transformation Manifolds with Grouped Operators
by: Ancelin, Brighton, et al.
Published: (2024)
by: Ancelin, Brighton, et al.
Published: (2024)
FLD+: Data-efficient Evaluation Metric for Generative Models
by: Jeevan, Pranav, et al.
Published: (2024)
by: Jeevan, Pranav, et al.
Published: (2024)
WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency
by: Jeevan, Pranav, et al.
Published: (2024)
by: Jeevan, Pranav, et al.
Published: (2024)
Normalizing Flow-Based Metric for Image Generation
by: Jeevan, Pranav, et al.
Published: (2024)
by: Jeevan, Pranav, et al.
Published: (2024)
Devanagari Handwritten Character Recognition using Convolutional Neural Network
by: Mehta, Diksha, et al.
Published: (2025)
by: Mehta, Diksha, et al.
Published: (2025)
Canonical Space Representation for 4D Panoptic Segmentation of Articulated Objects
by: Gomes, Manuel, et al.
Published: (2025)
by: Gomes, Manuel, et al.
Published: (2025)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning
by: Ma, Chong, et al.
Published: (2024)
by: Ma, Chong, et al.
Published: (2024)
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
by: Pandey, Anupam, et al.
Published: (2025)
by: Pandey, Anupam, et al.
Published: (2025)
Taming the Tail: Leveraging Asymmetric Loss and Pade Approximation to Overcome Medical Image Long-Tailed Class Imbalance
by: Kashyap, Pankhi, et al.
Published: (2024)
by: Kashyap, Pankhi, et al.
Published: (2024)
Fine-Tuning Vision-Language Models for Understanding Current Damage and Scoring Priority with Quality Guard Agent
by: Yasuno, Takato
Published: (2026)
by: Yasuno, Takato
Published: (2026)
Exploiting Causality Signals in Medical Images: A Pilot Study with Empirical Results
by: Carloni, Gianluca, et al.
Published: (2023)
by: Carloni, Gianluca, et al.
Published: (2023)
LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation
by: Wei, Hualiang, et al.
Published: (2026)
by: Wei, Hualiang, et al.
Published: (2026)
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
by: Doumbouya, Moussa Koulako Bala, et al.
Published: (2025)
by: Doumbouya, Moussa Koulako Bala, et al.
Published: (2025)
Evaluation Metric for Quality Control and Generative Models in Histopathology Images
by: Jeevan, Pranav, et al.
Published: (2024)
by: Jeevan, Pranav, et al.
Published: (2024)
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
by: Hansen-Estruch, Philippe, et al.
Published: (2025)
by: Hansen-Estruch, Philippe, et al.
Published: (2025)
Similar Items
-
Transfer-learning for video classification: Video Swin Transformer on multiple domains
by: Oliveira, Daniel A. P., et al.
Published: (2022) -
StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026) -
Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning
by: Ji, Binbin, et al.
Published: (2025) -
Trapped in texture bias? A large scale comparison of deep instance segmentation
by: Theodoridis, Johannes, et al.
Published: (2024) -
Cytoarchitecture in Words: Weakly Supervised Vision-Language Modeling for Human Brain Microscopy
by: Sutton, Matthew, et al.
Published: (2026)