:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bo, Zeyi, Sun, Wuxi, Jin, Ye
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2408.16195
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation
von: Ke, Shuyan, et al.
Veröffentlicht: (2026)

Contrastive learning-based video quality assessment-jointed video vision transformer for video recognition
von: Sun, Jian, et al.
Veröffentlicht: (2026)

BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions
von: Zhang, Jingdong, et al.
Veröffentlicht: (2023)

ColonMapper: topological mapping and localization for colonoscopy
von: Morlana, Javier, et al.
Veröffentlicht: (2023)

Multi-step manipulation task and motion planning guided by video demonstration
von: Zorina, Kateryna, et al.
Veröffentlicht: (2025)

Learning reusable concepts across different egocentric video understanding tasks
von: Peirone, Simone Alberto, et al.
Veröffentlicht: (2025)

Efficient RGB-D Scene Understanding via Multi-task Adaptive Learning and Cross-dimensional Feature Guidance
von: Sun, Guodong, et al.
Veröffentlicht: (2026)

bi-modal textual prompt learning for vision-language models in remote sensing
von: Kashyap, Pankhi, et al.
Veröffentlicht: (2026)

Multi-model learning by sequential reading of untrimmed videos for action recognition
von: Kamiya, Kodai, et al.
Veröffentlicht: (2024)

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity
von: Li, Bo, et al.
Veröffentlicht: (2023)

OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
von: Kim, Kwanyoung, et al.
Veröffentlicht: (2024)

MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network
von: Bui, Doanh C., et al.
Veröffentlicht: (2024)

UniPINN: A Unified PINN Framework for Multi-task Learning of Diverse Navier-Stokes Equations
von: Sun, Dengdi, et al.
Veröffentlicht: (2026)

Multi-modal video data-pipelines for machine learning with minimal human supervision
von: Pîrvu, Mihai-Cristian, et al.
Veröffentlicht: (2025)

PrePrompt: Predictive prompting for class incremental learning
von: Huang, Libo, et al.
Veröffentlicht: (2025)

Giving each task what it needs -- leveraging structured sparsity for tailored multi-task learning
von: Upadhyay, Richa, et al.
Veröffentlicht: (2024)

EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
von: Dong, Zhe, et al.
Veröffentlicht: (2025)

MT-Depth: Multi-task Instance feature analysis for the Depth Completion
von: Nizamani, Abdul Haseeb, et al.
Veröffentlicht: (2025)

Noise-aware few-shot learning through bi-directional multi-view prompt alignment
von: Niu, Lu, et al.
Veröffentlicht: (2026)

MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment
von: Pu, Yanyun, et al.
Veröffentlicht: (2025)

Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration
von: Wang, Pei, et al.
Veröffentlicht: (2024)

MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection
von: Zhao, Xiran, et al.
Veröffentlicht: (2026)

GPT4Point: A Unified Framework for Point-Language Understanding and Generation
von: Qi, Zhangyang, et al.
Veröffentlicht: (2023)

LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception
von: Zhou, Zixiang, et al.
Veröffentlicht: (2023)

iOSPointMapper: RealTime Pedestrian and Accessibility Mapping with Mobile AI
von: Naidu, Himanshu, et al.
Veröffentlicht: (2025)

Deep video representation learning: a survey
von: Ravanbakhsh, Elham, et al.
Veröffentlicht: (2024)

MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning
von: Zhao, Zixian, et al.
Veröffentlicht: (2025)

InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
von: Xu, Zhenhua, et al.
Veröffentlicht: (2023)

IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction
von: Zhu, Jiangtong, et al.
Veröffentlicht: (2025)

On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
von: Zanella, Maxime, et al.
Veröffentlicht: (2024)

MLVU: Benchmarking Multi-task Long Video Understanding
von: Zhou, Junjie, et al.
Veröffentlicht: (2024)

MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
von: Tang, Haoran, et al.
Veröffentlicht: (2024)

LifelongPR: Lifelong point cloud place recognition based on sample replay and prompt learning
von: Zou, Xianghong, et al.
Veröffentlicht: (2025)

Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
von: Zhou, Ziqi, et al.
Veröffentlicht: (2025)

Deep learning for action spotting in association football videos
von: Giancola, Silvio, et al.
Veröffentlicht: (2024)

A multi-center analysis of deep learning methods for video polyp detection and segmentation
von: Ghatwary, Noha, et al.
Veröffentlicht: (2026)

Body Segmentation Using Multi-task Learning
von: Jug, Julijan, et al.
Veröffentlicht: (2022)

A Multi-task Adversarial Attack Against Face Authentication
von: Wang, Hanrui, et al.
Veröffentlicht: (2024)

Forgetting of task-specific knowledge in model merging-based continual learning
von: Hess, Timm, et al.
Veröffentlicht: (2025)

CMI-MTL: Cross-Mamba interaction based multi-task learning for medical visual question answering
von: Jin, Qiangguo, et al.
Veröffentlicht: (2025)