Saved in:
| Main Authors: | Zhang, Zhiyuan, Li, Xiaofan, Xu, Zhihao, Peng, Wenjie, Zhou, Zijian, Shi, Miaojing, Huang, Shuangping |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.00379 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation
by: Huang, Shuangping, et al.
Published: (2024)
by: Huang, Shuangping, et al.
Published: (2024)
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
by: Zhou, Zijian, et al.
Published: (2023)
by: Zhou, Zijian, et al.
Published: (2023)
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
by: Zhou, Zijian, et al.
Published: (2024)
by: Zhou, Zijian, et al.
Published: (2024)
Enhancing Space-time Video Super-resolution via Spatial-temporal Feature Interaction
by: Yue, Zijie, et al.
Published: (2022)
by: Yue, Zijie, et al.
Published: (2022)
Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer
by: Chen, Xinyue, et al.
Published: (2024)
by: Chen, Xinyue, et al.
Published: (2024)
Text Promptable Surgical Instrument Segmentation with Vision-Language Models
by: Zhou, Zijian, et al.
Published: (2023)
by: Zhou, Zijian, et al.
Published: (2023)
Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning
by: Zhou, Zijian, et al.
Published: (2024)
by: Zhou, Zijian, et al.
Published: (2024)
OrbitNVS: Harnessing Video Diffusion Priors for Novel View Synthesis
by: Liang, Jinglin, et al.
Published: (2026)
by: Liang, Jinglin, et al.
Published: (2026)
Spatial Retrieval Augmented Autonomous Driving
by: Jia, Xiaosong, et al.
Published: (2025)
by: Jia, Xiaosong, et al.
Published: (2025)
Globally Correlation-Aware Hard Negative Generation
by: Peng, Wenjie, et al.
Published: (2024)
by: Peng, Wenjie, et al.
Published: (2024)
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
by: Liu, Zichen, et al.
Published: (2025)
by: Liu, Zichen, et al.
Published: (2025)
Multitask Learning in Minimally Invasive Surgical Vision: A Review
by: Alabi, Oluwatosin, et al.
Published: (2024)
by: Alabi, Oluwatosin, et al.
Published: (2024)
Language Prompt for Autonomous Driving
by: Wu, Dongming, et al.
Published: (2023)
by: Wu, Dongming, et al.
Published: (2023)
KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
by: Xu, Zhihao, et al.
Published: (2024)
by: Xu, Zhihao, et al.
Published: (2024)
Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation
by: Chen, Xinyue, et al.
Published: (2024)
by: Chen, Xinyue, et al.
Published: (2024)
Weakly-Supervised Referring Video Object Segmentation through Text Supervision
by: Shi, Miaojing, et al.
Published: (2026)
by: Shi, Miaojing, et al.
Published: (2026)
DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
by: Liu, Zhe, et al.
Published: (2025)
by: Liu, Zhe, et al.
Published: (2025)
FAAR: Efficient Frequency-Aware Multi-Task Fine-Tuning via Automatic Rank Selection
by: Fontana, Maxime, et al.
Published: (2026)
by: Fontana, Maxime, et al.
Published: (2026)
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
by: Du, Zhipeng, et al.
Published: (2023)
by: Du, Zhipeng, et al.
Published: (2023)
Unifying Language-Action Understanding and Generation for Autonomous Driving
by: Wang, Xinyang, et al.
Published: (2026)
by: Wang, Xinyang, et al.
Published: (2026)
CholecInstanceSeg: A Tool Instance Segmentation Dataset for Laparoscopic Surgery
by: Alabi, Oluwatosin, et al.
Published: (2024)
by: Alabi, Oluwatosin, et al.
Published: (2024)
The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey
by: Tu, Sifan, et al.
Published: (2025)
by: Tu, Sifan, et al.
Published: (2025)
When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review
by: Fontana, Maxime, et al.
Published: (2023)
by: Fontana, Maxime, et al.
Published: (2023)
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
by: Tian, Kexin, et al.
Published: (2025)
by: Tian, Kexin, et al.
Published: (2025)
Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding
by: Liu, Shuo, et al.
Published: (2026)
by: Liu, Shuo, et al.
Published: (2026)
Exploring Diversity-based Active Learning for 3D Object Detection in Autonomous Driving
by: Lin, Jinpeng, et al.
Published: (2022)
by: Lin, Jinpeng, et al.
Published: (2022)
Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization
by: Fontana, Maxime, et al.
Published: (2024)
by: Fontana, Maxime, et al.
Published: (2024)
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
by: Yuan, Linfeng, et al.
Published: (2023)
by: Yuan, Linfeng, et al.
Published: (2023)
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
by: Zhou, Hao, et al.
Published: (2024)
by: Zhou, Hao, et al.
Published: (2024)
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
by: Shi, Chen, et al.
Published: (2025)
by: Shi, Chen, et al.
Published: (2025)
SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving
by: Li, Peizheng, et al.
Published: (2025)
by: Li, Peizheng, et al.
Published: (2025)
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
by: Li, Xiaofan, et al.
Published: (2025)
by: Li, Xiaofan, et al.
Published: (2025)
PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network
by: Wang, Yuning, et al.
Published: (2024)
by: Wang, Yuning, et al.
Published: (2024)
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
by: Wu, Aodi, et al.
Published: (2025)
by: Wu, Aodi, et al.
Published: (2025)
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
by: Zhang, Yumeng, et al.
Published: (2024)
by: Zhang, Yumeng, et al.
Published: (2024)
LiDAR Prompted Spatio-Temporal Multi-View Stereo for Autonomous Driving
by: Sun, Qihao, et al.
Published: (2026)
by: Sun, Qihao, et al.
Published: (2026)
Spatial-aware Vision Language Model for Autonomous Driving
by: Wei, Weijie, et al.
Published: (2025)
by: Wei, Weijie, et al.
Published: (2025)
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation
by: Tan, Junwen, et al.
Published: (2026)
by: Tan, Junwen, et al.
Published: (2026)
DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving
by: Shi, Chen, et al.
Published: (2026)
by: Shi, Chen, et al.
Published: (2026)
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
by: Li, Yongkang, et al.
Published: (2026)
by: Li, Yongkang, et al.
Published: (2026)
Similar Items
-
SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation
by: Huang, Shuangping, et al.
Published: (2024) -
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
by: Zhou, Zijian, et al.
Published: (2023) -
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
by: Zhou, Zijian, et al.
Published: (2024) -
Enhancing Space-time Video Super-resolution via Spatial-temporal Feature Interaction
by: Yue, Zijie, et al.
Published: (2022) -
Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer
by: Chen, Xinyue, et al.
Published: (2024)