:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gani, Hanan, Saadi, Nada, Hussein, Noor, Nandakumar, Karthik
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2402.08070
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models
by: Imam, Raza, et al.
Published: (2024)

PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation
by: Saadi, Nada, et al.
Published: (2024)

PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning
by: Hussein, Noor, et al.
Published: (2024)

Efficient Parameter Adaptation for Multi-Modal Medical Image Segmentation and Prognosis
by: Saeed, Numan, et al.
Published: (2025)

Intra-finger Variability of Diffusion-based Latent Fingerprint Generation
by: Hussein, Noor, et al.
Published: (2026)

MOLM: Mixture of LoRA Markers
by: Fares, Samar, et al.
Published: (2025)

First-Place Solution to NeurIPS 2024 Invisible Watermark Removal Challenge
by: Shamshad, Fahad, et al.
Published: (2025)

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
by: Hassan, Jameel, et al.
Published: (2023)

Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification
by: Alkhunaizi, Naif, et al.
Published: (2024)

SPQR: A Standardized Benchmark for Modern Safety Alignment Methods in Text-to-Image Diffusion Models
by: Alam, Mohammed Talha, et al.
Published: (2025)

MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation
by: Gani, Hanan, et al.
Published: (2024)

Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
by: Malik, Hashmat Shadab, et al.
Published: (2025)

Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
by: Saadi, Ibtissam, et al.
Published: (2024)

SPDMark: Selective Parameter Displacement for Robust Video Watermarking
by: Fares, Samar, et al.
Published: (2025)

VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
by: Bharadwaj, Rohit, et al.
Published: (2024)

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
by: Gani, Hanan, et al.
Published: (2023)

Self-Supervised Vision Transformers Are Efficient Segmentation Learners for Imperfect Labels
by: Lee, Seungho, et al.
Published: (2024)

Calibration-Aware Prompt Learning for Medical Vision-Language Models
by: Basu, Abhishek, et al.
Published: (2025)

Vision Transformers are Circulant Attention Learners
by: Han, Dongchen, et al.
Published: (2025)

SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning
by: Maani, Fadillah, et al.
Published: (2023)

RAVEN: Erasing Invisible Watermarks via Novel View Synthesis
by: Shamshad, Fahad, et al.
Published: (2026)

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
by: Srivatsan, Koushik, et al.
Published: (2024)

Towards Evaluating the Robustness of Visual State Space Models
by: Malik, Hashmat Shadab, et al.
Published: (2024)

Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors
by: Shamshad, Fahad, et al.
Published: (2024)

RWKV-CLIP: A Robust Vision-Language Representation Learner
by: Gu, Tiancheng, et al.
Published: (2024)

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
by: Munasinghe, Shehan, et al.
Published: (2024)

AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment
by: Nawaz, Umair, et al.
Published: (2024)

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
by: Fares, Samar, et al.
Published: (2024)

Multi-Tailed Vision Transformer for Efficient Inference
by: Wang, Yunke, et al.
Published: (2022)

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
by: Islam, Khawar, et al.
Published: (2024)

FaceAnonyMixer: Cancelable Faces via Identity Consistent Latent Space Mixing
by: Alam, Mohammed Talha, et al.
Published: (2025)

Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
by: Park, Keon-Hee, et al.
Published: (2024)

Noise is an Efficient Learner for Zero-Shot Vision-Language Models
by: Imam, Raza, et al.
Published: (2025)

A Framework for Double-Blind Federated Adaptation of Foundation Models
by: Tastan, Nurbek, et al.
Published: (2025)

VideoMolmo: Spatio-Temporal Grounding Meets Pointing
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)

PE-CLIP: A Parameter-Efficient Fine-Tuning of Vision Language Models for Dynamic Facial Expression Recognition
by: Saadi, Ibtissam, et al.
Published: (2025)

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
by: Zhong, Hanwen, et al.
Published: (2025)

FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers
by: Yuan, Zheng, et al.
Published: (2024)

Continual Few-shot Adaptation for Synthetic Fingerprint Detection
by: Benjamin, Joseph Geo, et al.
Published: (2026)

Multi-modal Attribute Prompting for Vision-Language Models
by: Liu, Xin, et al.
Published: (2024)