:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ge, Junyao, Zhang, Xu, Zheng, Yang, Guo, Kaitai, Liang, Jimin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence I.4.8; I.2.10
Online Access:	https://arxiv.org/abs/2408.14744
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)

NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)

A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)

VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
by: Su, Yuetong, et al.
Published: (2025)

Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
by: Mehta, Vinit, et al.
Published: (2025)

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)

DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
by: Deng, Pei, et al.
Published: (2025)

Decoupling Vision and Language: Codebook Anchored Visual Adaptation
by: Wu, Jason, et al.
Published: (2026)

Embedding-Only Uplink for Onboard Retrieval Under Shift in Remote Sensing
by: Sim, Sangcheol
Published: (2026)

From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation
by: Chen, Jingkun, et al.
Published: (2025)

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)

From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)

MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition
by: Hu, Yudong, et al.
Published: (2025)

SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling
by: Liao, Guanghao, et al.
Published: (2026)

Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification
by: Manick, Rajeev, et al.
Published: (2026)

Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
by: Syed, Shahram Najam, et al.
Published: (2025)

StemVLA:An Open-Source Vision-Language-Action Model with Future 3D Spatial Geometry Knowledge and 4D Historical Representation
by: Xiao, Jiasong, et al.
Published: (2026)

From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
by: Gonzalez, Leonardo
Published: (2026)

T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models
by: Chen, Yiteng, et al.
Published: (2025)

SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
by: Li, Xinqing, et al.
Published: (2025)

VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
by: Pu, Qingwen, et al.
Published: (2026)

Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection
by: Deka, Dawar Jyoti, et al.
Published: (2026)

Neighborhood Feature Pooling for Remote Sensing Image Classification
by: Nia, Fahimeh Orvati, et al.
Published: (2025)

A Guide to Structureless Visual Localization
by: Panek, Vojtech, et al.
Published: (2025)

StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)

Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)

High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery
by: Peng, Hongxing, et al.
Published: (2025)

Combining Absolute and Semi-Generalized Relative Poses for Visual Localization
by: Panek, Vojtech, et al.
Published: (2024)

Privacy-Preserving Structureless Visual Localization via Image Obfuscation
by: Panek, Vojtech, et al.
Published: (2026)

Neuromorphic Monocular Depth Estimation with Uncertainty Modeling
by: Bergkvist, Viktor, et al.
Published: (2026)

Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)

FeedbackSTS-Det: Sparse Frames-Based Spatio-Temporal Semantic Feedback Network for Moving Infrared Small Target Detection
by: Huang, Yian, et al.
Published: (2026)

OmniAcc: Personalized Accessibility Assistant Using Generative AI
by: Karki, Siddhant, et al.
Published: (2025)

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)

DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
by: Hou, Zhiyi, et al.
Published: (2025)

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception
by: Meng, Siyuan, et al.
Published: (2026)

Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images
by: Chen, Yuangong, et al.
Published: (2026)

CoMatcher: Multi-View Collaborative Feature Matching
by: Zhang, Jintao, et al.
Published: (2025)