:: Library Catalog

Beeld op de omslag

Bewaard in:

Bibliografische gegevens
Hoofdauteurs:	Malik, Hashmat Shadab, Huzaifa, Muhammad, Naseer, Muzammal, Khan, Salman, Khan, Fahad Shahbaz
Formaat:	Preprint
Gepubliceerd in:	2024
Onderwerpen:	Computer Vision and Pattern Recognition Artificial Intelligence
Online toegang:	https://arxiv.org/abs/2403.04701
Tags:	Voeg label toe Geen labels, Wees de eerste die dit record labelt!

Gelijkaardige items

Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
door: Malik, Hashmat Shadab, et al.
Gepubliceerd in: (2025)

Towards Evaluating the Robustness of Visual State Space Models
door: Malik, Hashmat Shadab, et al.
Gepubliceerd in: (2024)

Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
door: Malik, Hashmat Shadab, et al.
Gepubliceerd in: (2025)

On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
door: Malik, Hashmat Shadab, et al.
Gepubliceerd in: (2024)

Enhancing Novel Object Detection via Cooperative Foundational Models
door: Bharadwaj, Rohit, et al.
Gepubliceerd in: (2023)

Language Guided Domain Generalized Medical Image Segmentation
door: Kunhimon, Shahina, et al.
Gepubliceerd in: (2024)

Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels
door: Dharmasiri, Amaya, et al.
Gepubliceerd in: (2024)

VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
door: Bharadwaj, Rohit, et al.
Gepubliceerd in: (2024)

Composed Video Retrieval via Enriched Context and Discriminative Embeddings
door: Thawakar, Omkar, et al.
Gepubliceerd in: (2024)

How Good is my Histopathology Vision-Language Foundation Model? A Holistic Benchmark
door: Majzoub, Roba Al, et al.
Gepubliceerd in: (2025)

UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities
door: Khattak, Muhammad Uzair, et al.
Gepubliceerd in: (2024)

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
door: Mahmood, Ahmad, et al.
Gepubliceerd in: (2024)

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning
door: Watawana, Hasindri, et al.
Gepubliceerd in: (2024)

CDChat: A Large Multimodal Model for Remote Sensing Change Description
door: Noman, Mubashir, et al.
Gepubliceerd in: (2024)

Composed Object Retrieval: Object-level Retrieval via Composed Expressions
door: Wang, Tong, et al.
Gepubliceerd in: (2025)

Learnable Weight Initialization for Volumetric Medical Image Segmentation
door: Kunhimon, Shahina, et al.
Gepubliceerd in: (2023)

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
door: Noman, Mubashir, et al.
Gepubliceerd in: (2024)

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning
door: Hanif, Asif, et al.
Gepubliceerd in: (2024)

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
door: Khattak, Muhammad Uzair, et al.
Gepubliceerd in: (2024)

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
door: Wasim, Syed Talal, et al.
Gepubliceerd in: (2023)

MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation
door: Gani, Hanan, et al.
Gepubliceerd in: (2024)

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
door: Hassan, Jameel, et al.
Gepubliceerd in: (2023)

Multi-Granularity Language-Guided Training for Multi-Object Tracking
door: Li, Yuhao, et al.
Gepubliceerd in: (2024)

AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment
door: Nawaz, Umair, et al.
Gepubliceerd in: (2024)

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
door: Maaz, Muhammad, et al.
Gepubliceerd in: (2023)

Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
door: Chen, Shiming, et al.
Gepubliceerd in: (2025)

Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
door: Maaz, Muhammad, et al.
Gepubliceerd in: (2025)

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
door: Thawakar, Omkar, et al.
Gepubliceerd in: (2025)

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
door: Chen, Shiming, et al.
Gepubliceerd in: (2024)

Vocabulary-free Fine-grained Visual Recognition via Enriched Contextually Grounded Vision-Language Model
door: Demidov, Dmitry, et al.
Gepubliceerd in: (2025)

Efficient Video Object Segmentation via Modulated Cross-Attention Memory
door: Shaker, Abdelrahman, et al.
Gepubliceerd in: (2024)

ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection
door: Noman, Mubashir, et al.
Gepubliceerd in: (2024)

Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking
door: Ishaq, Ayesha, et al.
Gepubliceerd in: (2024)

PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning
door: Hussein, Noor, et al.
Gepubliceerd in: (2024)

GroupMamba: Efficient Group-Based Visual State Space Model
door: Shaker, Abdelrahman, et al.
Gepubliceerd in: (2024)

Mobile-VideoGPT: Fast and Accurate Model for Mobile Video Understanding
door: Shaker, Abdelrahman, et al.
Gepubliceerd in: (2025)

Underwater Object Detection Enhancement via Channel Stabilization
door: Ali, Muhammad, et al.
Gepubliceerd in: (2024)

Multi-modal Generation via Cross-Modal In-Context Learning
door: Kumar, Amandeep, et al.
Gepubliceerd in: (2024)

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
door: Gani, Hanan, et al.
Gepubliceerd in: (2023)

CONDA: Condensed Deep Association Learning for Co-Salient Object Detection
door: Li, Long, et al.
Gepubliceerd in: (2024)