Saved in:
| Main Authors: | Ge, Junyao, Zhang, Xu, Zheng, Yang, Guo, Kaitai, Liang, Jimin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.14744 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
by: Durrani, Hamza Ahmed, et al.
Published: (2026)
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025)
by: Lee, Kyuho, et al.
Published: (2025)
A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)
by: Jian, Song, et al.
Published: (2025)
VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
by: Su, Yuetong, et al.
Published: (2025)
by: Su, Yuetong, et al.
Published: (2025)
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
by: Mehta, Vinit, et al.
Published: (2025)
by: Mehta, Vinit, et al.
Published: (2025)
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)
by: Wang, Chaoyi, et al.
Published: (2025)
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
by: Deng, Pei, et al.
Published: (2025)
by: Deng, Pei, et al.
Published: (2025)
Decoupling Vision and Language: Codebook Anchored Visual Adaptation
by: Wu, Jason, et al.
Published: (2026)
by: Wu, Jason, et al.
Published: (2026)
Embedding-Only Uplink for Onboard Retrieval Under Shift in Remote Sensing
by: Sim, Sangcheol
Published: (2026)
by: Sim, Sangcheol
Published: (2026)
From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation
by: Chen, Jingkun, et al.
Published: (2025)
by: Chen, Jingkun, et al.
Published: (2025)
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
by: Guo, Yijie, et al.
Published: (2025)
by: Guo, Yijie, et al.
Published: (2025)
From eye to AI: studying rodent social behavior in the era of machine Learning
by: Chindemi, Giuseppe, et al.
Published: (2025)
by: Chindemi, Giuseppe, et al.
Published: (2025)
MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition
by: Hu, Yudong, et al.
Published: (2025)
by: Hu, Yudong, et al.
Published: (2025)
SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling
by: Liao, Guanghao, et al.
Published: (2026)
by: Liao, Guanghao, et al.
Published: (2026)
Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification
by: Manick, Rajeev, et al.
Published: (2026)
by: Manick, Rajeev, et al.
Published: (2026)
Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
by: Syed, Shahram Najam, et al.
Published: (2025)
by: Syed, Shahram Najam, et al.
Published: (2025)
StemVLA:An Open-Source Vision-Language-Action Model with Future 3D Spatial Geometry Knowledge and 4D Historical Representation
by: Xiao, Jiasong, et al.
Published: (2026)
by: Xiao, Jiasong, et al.
Published: (2026)
From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
by: Gonzalez, Leonardo
Published: (2026)
by: Gonzalez, Leonardo
Published: (2026)
T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models
by: Chen, Yiteng, et al.
Published: (2025)
by: Chen, Yiteng, et al.
Published: (2025)
SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
by: Li, Xinqing, et al.
Published: (2025)
by: Li, Xinqing, et al.
Published: (2025)
VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions
by: Pu, Qingwen, et al.
Published: (2026)
by: Pu, Qingwen, et al.
Published: (2026)
Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection
by: Deka, Dawar Jyoti, et al.
Published: (2026)
by: Deka, Dawar Jyoti, et al.
Published: (2026)
Neighborhood Feature Pooling for Remote Sensing Image Classification
by: Nia, Fahimeh Orvati, et al.
Published: (2025)
by: Nia, Fahimeh Orvati, et al.
Published: (2025)
A Guide to Structureless Visual Localization
by: Panek, Vojtech, et al.
Published: (2025)
by: Panek, Vojtech, et al.
Published: (2025)
StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
by: Oliveira, Daniel, et al.
Published: (2026)
by: Oliveira, Daniel, et al.
Published: (2026)
Motion-Guided Semantic Alignment with Negative Prompts for Zero-Shot Video Action Recognition
by: Wang, Yiming, et al.
Published: (2026)
by: Wang, Yiming, et al.
Published: (2026)
High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery
by: Peng, Hongxing, et al.
Published: (2025)
by: Peng, Hongxing, et al.
Published: (2025)
Combining Absolute and Semi-Generalized Relative Poses for Visual Localization
by: Panek, Vojtech, et al.
Published: (2024)
by: Panek, Vojtech, et al.
Published: (2024)
Privacy-Preserving Structureless Visual Localization via Image Obfuscation
by: Panek, Vojtech, et al.
Published: (2026)
by: Panek, Vojtech, et al.
Published: (2026)
Neuromorphic Monocular Depth Estimation with Uncertainty Modeling
by: Bergkvist, Viktor, et al.
Published: (2026)
by: Bergkvist, Viktor, et al.
Published: (2026)
Exploring Surround-View Fisheye Camera 3D Object Detection
by: Li, Changcai, et al.
Published: (2025)
by: Li, Changcai, et al.
Published: (2025)
FeedbackSTS-Det: Sparse Frames-Based Spatio-Temporal Semantic Feedback Network for Moving Infrared Small Target Detection
by: Huang, Yian, et al.
Published: (2026)
by: Huang, Yian, et al.
Published: (2026)
OmniAcc: Personalized Accessibility Assistant Using Generative AI
by: Karki, Siddhant, et al.
Published: (2025)
by: Karki, Siddhant, et al.
Published: (2025)
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation
by: Hou, Zhangcheng, et al.
Published: (2026)
by: Hou, Zhangcheng, et al.
Published: (2026)
DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
by: Hou, Zhiyi, et al.
Published: (2025)
by: Hou, Zhiyi, et al.
Published: (2025)
Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception
by: Meng, Siyuan, et al.
Published: (2026)
by: Meng, Siyuan, et al.
Published: (2026)
Perceptual Flow Network for Visually Grounded Reasoning
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images
by: Chen, Yuangong, et al.
Published: (2026)
by: Chen, Yuangong, et al.
Published: (2026)
CoMatcher: Multi-View Collaborative Feature Matching
by: Zhang, Jintao, et al.
Published: (2025)
by: Zhang, Jintao, et al.
Published: (2025)
Similar Items
-
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025) -
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
by: Durrani, Hamza Ahmed, et al.
Published: (2026) -
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
by: Lee, Kyuho, et al.
Published: (2025) -
A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025) -
VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
by: Su, Yuetong, et al.
Published: (2025)