:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhiyuan, Li, Xiaofan, Xu, Zhihao, Peng, Wenjie, Zhou, Zijian, Shi, Miaojing, Huang, Shuangping
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.00379
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation
by: Huang, Shuangping, et al.
Published: (2024)

VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
by: Zhou, Zijian, et al.
Published: (2023)

OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
by: Zhou, Zijian, et al.
Published: (2024)

Enhancing Space-time Video Super-resolution via Spatial-temporal Feature Interaction
by: Yue, Zijie, et al.
Published: (2022)

Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer
by: Chen, Xinyue, et al.
Published: (2024)

Text Promptable Surgical Instrument Segmentation with Vision-Language Models
by: Zhou, Zijian, et al.
Published: (2023)

Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning
by: Zhou, Zijian, et al.
Published: (2024)

OrbitNVS: Harnessing Video Diffusion Priors for Novel View Synthesis
by: Liang, Jinglin, et al.
Published: (2026)

Spatial Retrieval Augmented Autonomous Driving
by: Jia, Xiaosong, et al.
Published: (2025)

Globally Correlation-Aware Hard Negative Generation
by: Peng, Wenjie, et al.
Published: (2024)

STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
by: Liu, Zichen, et al.
Published: (2025)

Multitask Learning in Minimally Invasive Surgical Vision: A Review
by: Alabi, Oluwatosin, et al.
Published: (2024)

Language Prompt for Autonomous Driving
by: Wu, Dongming, et al.
Published: (2023)

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
by: Xu, Zhihao, et al.
Published: (2024)

Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation
by: Chen, Xinyue, et al.
Published: (2024)

Weakly-Supervised Referring Video Object Segmentation through Text Supervision
by: Shi, Miaojing, et al.
Published: (2026)

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
by: Liu, Zhe, et al.
Published: (2025)

FAAR: Efficient Frequency-Aware Multi-Task Fine-Tuning via Automatic Rank Selection
by: Fontana, Maxime, et al.
Published: (2026)

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
by: Du, Zhipeng, et al.
Published: (2023)

Unifying Language-Action Understanding and Generation for Autonomous Driving
by: Wang, Xinyang, et al.
Published: (2026)

CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery
by: Alabi, Oluwatosin, et al.
Published: (2024)

The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey
by: Tu, Sifan, et al.
Published: (2025)

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review
by: Fontana, Maxime, et al.
Published: (2023)

NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)

Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding
by: Liu, Shuo, et al.
Published: (2026)

Exploring Diversity-based Active Learning for 3D Object Detection in Autonomous Driving
by: Lin, Jinpeng, et al.
Published: (2022)

Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization
by: Fontana, Maxime, et al.
Published: (2024)

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
by: Yuan, Linfeng, et al.
Published: (2023)

Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
by: Zhou, Hao, et al.
Published: (2024)

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
by: Shi, Chen, et al.
Published: (2025)

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving
by: Li, Peizheng, et al.
Published: (2025)

U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
by: Li, Xiaofan, et al.
Published: (2025)

PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network
by: Wang, Yuning, et al.
Published: (2024)

Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
by: Wu, Aodi, et al.
Published: (2025)

BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
by: Zhang, Yumeng, et al.
Published: (2024)

LiDAR Prompted Spatio-Temporal Multi-View Stereo for Autonomous Driving
by: Sun, Qihao, et al.
Published: (2026)

Spatial-aware Vision Language Model for Autonomous Driving
by: Wei, Weijie, et al.
Published: (2025)

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation
by: Tan, Junwen, et al.
Published: (2026)

DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving
by: Shi, Chen, et al.
Published: (2026)

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
by: Li, Yongkang, et al.
Published: (2026)