:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ramachandran, Rahul, Kulkarni, Tejal, Sharma, Charchit, Vijaykeerthy, Deepak, Balasubramanian, Vineeth N
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2409.04041
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection
by: VCR, Sairam, et al.
Published: (2025)

C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
by: Rebbapragada, Sairam VC, et al.
Published: (2024)

Understanding Task Transfer in Vision-Language Models
by: Sachdeva, Bhuvan, et al.
Published: (2025)

Source-Free Domain Adaptation by Optimizing Batch-Wise Cosine Similarity
by: Pathak, Harsharaj, et al.
Published: (2026)

$\oslash$ Source Models Leak What They Shouldn't $\nrightarrow$: Unlearning Zero-Shot Transfer in Domain Adaptation Through Adversarial Optimization
by: Devalapally, Arnav, et al.
Published: (2026)

Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks
by: Garg, Tanmay, et al.
Published: (2024)

Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
by: Santra, Sanchayan, et al.
Published: (2025)

LogicCBMs: Logic-Enhanced Concept-Based Learning
by: Vemuri, Deepika SN, et al.
Published: (2025)

Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media
by: M, Megha Mariam K., et al.
Published: (2026)

Open-Set Object Detection By Aligning Known Class Representations
by: Sarkar, Hiran, et al.
Published: (2024)

Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?
by: Kuchibhotla, Hari Chandana, et al.
Published: (2024)

Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization
by: Kancheti, Sai Srinivas, et al.
Published: (2026)

Evaluation of Cultural Competence of Vision-Language Models
by: Yadav, Srishti, et al.
Published: (2025)

iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception
by: Mehrotra, Sarthak, et al.
Published: (2025)

Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models
by: Agrawal, Susmit, et al.
Published: (2025)

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
by: Kumar, Deepak, et al.
Published: (2026)

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs
by: Kancheti, Sai Srinivas, et al.
Published: (2026)

Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach
by: Khindkar, Vaishnavi, et al.
Published: (2024)

Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models
by: Shukla, Pushkar, et al.
Published: (2025)

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
by: Ramachandran, Rahul, et al.
Published: (2025)

POET: Prompt Offset Tuning for Continual Human Action Adaptation
by: Garg, Prachi, et al.
Published: (2025)

Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs
by: Kuchibhotla, Hari Chandana, et al.
Published: (2025)

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
by: Sinha, Rohit, et al.
Published: (2026)

Fiducial Focus Augmentation for Facial Landmark Detection
by: Kar, Purbayan, et al.
Published: (2024)

MicroVision: An Open Dataset and Benchmark Models for Detecting Vulnerable Road Users and Micromobility Vehicles
by: Rasch, Alexander, et al.
Published: (2026)

Swift Sampling: Selecting Temporal Surprises via Taylor Series
by: Kim, Dahye, et al.
Published: (2026)

Interpreting Neurons in Deep Vision Networks with Language Models
by: Bai, Nicholas, et al.
Published: (2024)

VideoSAVi: Self-Aligned Video Language Models without Human Supervision
by: Kulkarni, Yogesh, et al.
Published: (2024)

BiasConnect: Investigating Bias Interactions in Text-to-Image Models
by: Shukla, Pushkar, et al.
Published: (2025)

CRoPS: A Training-Free Hallucination Mitigation Framework for Vision-Language Models
by: Anand, Neeraj, et al.
Published: (2026)

Artifact Removal and Image Restoration in AFM:A Structured Mask-Guided Directional Inpainting Approach
by: Zhang, Juntao, et al.
Published: (2026)

Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification
by: Kulkarni, Arun D.
Published: (2026)

Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models
by: Titikhsha, Antara, et al.
Published: (2025)

Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy
by: Handa, Palak, et al.
Published: (2024)

OuroMamba: A Data-Free Quantization Framework for Vision Mamba
by: Ramachandran, Akshat, et al.
Published: (2025)

Evaluation of Human Visual Privacy Protection: A Three-Dimensional Framework and Benchmark Dataset
by: Abdulaziz, Sara, et al.
Published: (2025)

The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic
by: Sharma, Akash, et al.
Published: (2025)

Grounding Descriptions in Images informs Zero-Shot Visual Recognition
by: Halbe, Shaunak, et al.
Published: (2024)

DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
by: Li, Haodong, et al.
Published: (2024)

On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
by: Liao, Ning, et al.
Published: (2023)