:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Junde, Zhu, Jiayuan, Xu, Min, Jin, Yueming
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.05703
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

One-Prompt to Segment All Medical Images
by: Wu, Junde, et al.
Published: (2023)

Medical SAM 2: Segment medical images as video via Segment Anything Model 2
by: Zhu, Jiayuan, et al.
Published: (2024)

MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging
by: Zhou, Jiaying, et al.
Published: (2024)

MedUHIP: Towards Human-In-the-Loop Medical Segmentation
by: Zhu, Jiayuan, et al.
Published: (2024)

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation
by: Wu, Junde, et al.
Published: (2023)

Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
by: Wu, Junde, et al.
Published: (2024)

MedVAR: Towards Scalable and Efficient Medical Image Generation via Next-scale Autoregressive Prediction
by: He, Zhicheng, et al.
Published: (2026)

Towards Collective Intelligence: Uncertainty-aware SAM Adaptation for Ambiguous Medical Image Segmentation
by: Jiang, Mingzhou, et al.
Published: (2024)

MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows
by: Shen, Weixiang, et al.
Published: (2026)

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
by: Liu, Haofeng, et al.
Published: (2024)

SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation
by: Zhu, Jiayuan, et al.
Published: (2024)

Scalable Object Detection in the Car Interior With Vision Foundation Models
by: Schmidt, Sebastian, et al.
Published: (2025)

From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding
by: Liu, Yuyuan, et al.
Published: (2026)

in-Car Biometrics (iCarB) Datasets for Driver Recognition: Face, Fingerprint, and Voice
by: Hahn, Vedrana Krivokuca, et al.
Published: (2024)

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
by: Chen, Jiali, et al.
Published: (2024)

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation
by: Qin, Guanyi, et al.
Published: (2025)

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
by: Liu, Haofeng, et al.
Published: (2025)

DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction
by: Du, Xiaobiao, et al.
Published: (2024)

Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car Reconstruction
by: Li, Congcong, et al.
Published: (2025)

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
by: Pan, Jiazhen, et al.
Published: (2025)

Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos
by: Rodriguez-Juan, Javier, et al.
Published: (2025)

3DMedAgent: Unified Perception-to-Understanding for 3D Medical Analysis
by: Wang, Ziyue, et al.
Published: (2026)

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
by: Du, Xiaobiao, et al.
Published: (2024)

An Effective End-to-End Solution for Multimodal Action Recognition
by: Wang, Songping, et al.
Published: (2025)

Range and Bird's Eye View Fused Cross-Modal Visual Place Recognition
by: Peng, Jianyi, et al.
Published: (2025)

Infrared Adversarial Car Stickers
by: Zhu, Xiaopei, et al.
Published: (2024)

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation
by: Li, Bohan, et al.
Published: (2026)

DiffusionAgent: Navigating Expert Models for Agentic Image Generation
by: Qin, Jie, et al.
Published: (2024)

ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection
by: Wu, Zijian, et al.
Published: (2025)

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
by: Kwon, Hyeongjun, et al.
Published: (2024)

Generalized Deep Multi-view Clustering via Causal Learning with Partially Aligned Cross-view Correspondence
by: Yang, Xihong, et al.
Published: (2025)

You Only Look at Once for Real-time and Generic Multi-Task
by: Wang, Jiayuan, et al.
Published: (2023)

GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic
by: Lu, Jiayuan, et al.
Published: (2026)

DTL: Disentangled Transfer Learning for Visual Recognition
by: Fu, Minghao, et al.
Published: (2023)

Exploiting Polarized Material Cues for Robust Car Detection
by: Dong, Wen, et al.
Published: (2024)

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation
by: Zhu, Junyu, et al.
Published: (2023)

TCFormer: Visual Recognition via Token Clustering Transformer
by: Zeng, Wang, et al.
Published: (2024)

AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
by: Liu, Yuyuan, et al.
Published: (2025)

T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition
by: Yeh, Chen, et al.
Published: (2024)

Generalized Recognition of Basic Surgical Actions Enables Skill Assessment and Vision-Language-Model-based Surgical Planning
by: Xu, Mengya, et al.
Published: (2026)