:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Banks, Ryan, Rovira-Lastra, Bernat, Martinez-Gomis, Jordi, Chaurasia, Akhilanand, Li, Yunpeng
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence I.2.1, I.2.10, J.3
Online Access:	https://arxiv.org/abs/2407.07604
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Periodontal Bone Loss Analysis via Keypoint Detection With Heuristic Post-Processing
by: Banks, Ryan, et al.
Published: (2025)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

Canonical Space Representation for 4D Panoptic Segmentation of Articulated Objects
by: Gomes, Manuel, et al.
Published: (2025)

Convolutional Model Trees
by: Armstrong, William Ward, et al.
Published: (2025)

A Segmented Robot Grasping Perception Neural Network for Edge AI
by: Bröcheler, Casper, et al.
Published: (2025)

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping
by: Li, Danyang, et al.
Published: (2025)

Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight
by: Romero, Angel, et al.
Published: (2025)

WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness
by: Gasparino, Mateus Valverde, et al.
Published: (2024)

EmbodiedLGR: Integrating Lightweight Graph Representation and Retrieval for Semantic-Spatial Memory in Robotic Agents
by: Riva, Paolo, et al.
Published: (2026)

Taking Flight with Dialogue: Enabling Natural Language Control for PX4-based Drone Agent
by: Lim, Shoon Kit, et al.
Published: (2025)

StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)

Deep Probabilistic Traversability with Test-time Adaptation for Uncertainty-aware Planetary Rover Navigation
by: Endo, Masafumi, et al.
Published: (2024)

CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning
by: Hu, Pan
Published: (2025)

Motion Perceiver: Real-Time Occupancy Forecasting for Embedded Systems
by: Ferenczi, Bryce, et al.
Published: (2023)

TUMLS: Trustful Fully Unsupervised Multi-Level Segmentation for Whole Slide Images of Histology
by: Rehamnia, Walid, et al.
Published: (2025)

CCVA-FL: Cross-Client Variations Adaptive Federated Learning for Medical Imaging
by: Gupta, Sunny, et al.
Published: (2024)

Taming the Tail: Leveraging Asymmetric Loss and Pade Approximation to Overcome Medical Image Long-Tailed Class Imbalance
by: Kashyap, Pankhi, et al.
Published: (2024)

Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion
by: Zhu, Yu, et al.
Published: (2025)

Learning Association via Track-Detection Matching for Multi-Object Tracking
by: Adžemović, Momir
Published: (2025)

CrystalDiT: A Diffusion Transformer for Crystal Generation
by: Yi, Xiaohan, et al.
Published: (2025)

Towards Hard and Soft Shadow Removal via Dual-Branch Separation Network and Vision Transformer
by: Liang, Jiajia
Published: (2025)

SUN Team's Contribution to ABAW 2024 Competition: Audio-visual Valence-Arousal Estimation and Expression Recognition
by: Dresvyanskiy, Denis, et al.
Published: (2024)

Learning the meanings of function words from grounded language using a visual question answering model
by: Portelance, Eva, et al.
Published: (2023)

Closed-Loop Neural Activation Control in Vision-Language-Action Models
by: Babu, Abhijith, et al.
Published: (2026)

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
by: Yang, Shan
Published: (2026)

Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)

Fuzzy Convolution Neural Networks for Tabular Data Classification
by: Kulkarni, Arun D.
Published: (2024)

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)

Vision-based Situational Graphs Exploiting Fiducial Markers for the Integration of Semantic Entities
by: Tourani, Ali, et al.
Published: (2023)

UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments
by: Radwan, Ahmed, et al.
Published: (2024)

PathFormer: A Transformer with 3D Grid Constraints for Digital Twin Robot-Arm Trajectory Generation
by: Alanazi, Ahmed, et al.
Published: (2025)

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding
by: Tourani, Ali, et al.
Published: (2025)

PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)

Universal Adversarial Attack on Aligned Multimodal LLMs
by: Rahmatullaev, Temurbek, et al.
Published: (2025)

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
by: Tong, Jingqi, et al.
Published: (2025)

SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing
by: Meng, Zi, et al.
Published: (2026)

AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers
by: Katakam, Raj Kiran Gupta
Published: (2026)

Memory-Efficient Differentially Private Training with Gradient Random Projection
by: Mulrooney, Alex, et al.
Published: (2025)

EduFlow: Advancing MLLMs' Problem-Solving Proficiency through Multi-Stage, Multi-Perspective Critique
by: Zhu, Chenglin, et al.
Published: (2025)