Saved in:
| Main Authors: | Ramachandran, Rahul, Kulkarni, Tejal, Sharma, Charchit, Vijaykeerthy, Deepak, Balasubramanian, Vineeth N |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.04041 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection
by: VCR, Sairam, et al.
Published: (2025)
by: VCR, Sairam, et al.
Published: (2025)
C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
by: Rebbapragada, Sairam VC, et al.
Published: (2024)
by: Rebbapragada, Sairam VC, et al.
Published: (2024)
Understanding Task Transfer in Vision-Language Models
by: Sachdeva, Bhuvan, et al.
Published: (2025)
by: Sachdeva, Bhuvan, et al.
Published: (2025)
Source-Free Domain Adaptation by Optimizing Batch-Wise Cosine Similarity
by: Pathak, Harsharaj, et al.
Published: (2026)
by: Pathak, Harsharaj, et al.
Published: (2026)
$\oslash$ Source Models Leak What They Shouldn't $\nrightarrow$: Unlearning Zero-Shot Transfer in Domain Adaptation Through Adversarial Optimization
by: Devalapally, Arnav, et al.
Published: (2026)
by: Devalapally, Arnav, et al.
Published: (2026)
Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks
by: Garg, Tanmay, et al.
Published: (2024)
by: Garg, Tanmay, et al.
Published: (2024)
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
by: Santra, Sanchayan, et al.
Published: (2025)
by: Santra, Sanchayan, et al.
Published: (2025)
LogicCBMs: Logic-Enhanced Concept-Based Learning
by: Vemuri, Deepika SN, et al.
Published: (2025)
by: Vemuri, Deepika SN, et al.
Published: (2025)
Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media
by: M, Megha Mariam K., et al.
Published: (2026)
by: M, Megha Mariam K., et al.
Published: (2026)
Open-Set Object Detection By Aligning Known Class Representations
by: Sarkar, Hiran, et al.
Published: (2024)
by: Sarkar, Hiran, et al.
Published: (2024)
Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?
by: Kuchibhotla, Hari Chandana, et al.
Published: (2024)
by: Kuchibhotla, Hari Chandana, et al.
Published: (2024)
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization
by: Kancheti, Sai Srinivas, et al.
Published: (2026)
by: Kancheti, Sai Srinivas, et al.
Published: (2026)
Evaluation of Cultural Competence of Vision-Language Models
by: Yadav, Srishti, et al.
Published: (2025)
by: Yadav, Srishti, et al.
Published: (2025)
iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception
by: Mehrotra, Sarthak, et al.
Published: (2025)
by: Mehrotra, Sarthak, et al.
Published: (2025)
Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models
by: Agrawal, Susmit, et al.
Published: (2025)
by: Agrawal, Susmit, et al.
Published: (2025)
GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
by: Kumar, Deepak, et al.
Published: (2026)
by: Kumar, Deepak, et al.
Published: (2026)
Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs
by: Kancheti, Sai Srinivas, et al.
Published: (2026)
by: Kancheti, Sai Srinivas, et al.
Published: (2026)
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach
by: Khindkar, Vaishnavi, et al.
Published: (2024)
by: Khindkar, Vaishnavi, et al.
Published: (2024)
Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models
by: Shukla, Pushkar, et al.
Published: (2025)
by: Shukla, Pushkar, et al.
Published: (2025)
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
by: Ramachandran, Rahul, et al.
Published: (2025)
by: Ramachandran, Rahul, et al.
Published: (2025)
POET: Prompt Offset Tuning for Continual Human Action Adaptation
by: Garg, Prachi, et al.
Published: (2025)
by: Garg, Prachi, et al.
Published: (2025)
Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs
by: Kuchibhotla, Hari Chandana, et al.
Published: (2025)
by: Kuchibhotla, Hari Chandana, et al.
Published: (2025)
Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
by: Sinha, Rohit, et al.
Published: (2026)
by: Sinha, Rohit, et al.
Published: (2026)
Fiducial Focus Augmentation for Facial Landmark Detection
by: Kar, Purbayan, et al.
Published: (2024)
by: Kar, Purbayan, et al.
Published: (2024)
MicroVision: An Open Dataset and Benchmark Models for Detecting Vulnerable Road Users and Micromobility Vehicles
by: Rasch, Alexander, et al.
Published: (2026)
by: Rasch, Alexander, et al.
Published: (2026)
Swift Sampling: Selecting Temporal Surprises via Taylor Series
by: Kim, Dahye, et al.
Published: (2026)
by: Kim, Dahye, et al.
Published: (2026)
Interpreting Neurons in Deep Vision Networks with Language Models
by: Bai, Nicholas, et al.
Published: (2024)
by: Bai, Nicholas, et al.
Published: (2024)
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
by: Kulkarni, Yogesh, et al.
Published: (2024)
by: Kulkarni, Yogesh, et al.
Published: (2024)
BiasConnect: Investigating Bias Interactions in Text-to-Image Models
by: Shukla, Pushkar, et al.
Published: (2025)
by: Shukla, Pushkar, et al.
Published: (2025)
CRoPS: A Training-Free Hallucination Mitigation Framework for Vision-Language Models
by: Anand, Neeraj, et al.
Published: (2026)
by: Anand, Neeraj, et al.
Published: (2026)
Artifact Removal and Image Restoration in AFM:A Structured Mask-Guided Directional Inpainting Approach
by: Zhang, Juntao, et al.
Published: (2026)
by: Zhang, Juntao, et al.
Published: (2026)
Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification
by: Kulkarni, Arun D.
Published: (2026)
by: Kulkarni, Arun D.
Published: (2026)
Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models
by: Titikhsha, Antara, et al.
Published: (2025)
by: Titikhsha, Antara, et al.
Published: (2025)
Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy
by: Handa, Palak, et al.
Published: (2024)
by: Handa, Palak, et al.
Published: (2024)
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
by: Ramachandran, Akshat, et al.
Published: (2025)
by: Ramachandran, Akshat, et al.
Published: (2025)
Evaluation of Human Visual Privacy Protection: A Three-Dimensional Framework and Benchmark Dataset
by: Abdulaziz, Sara, et al.
Published: (2025)
by: Abdulaziz, Sara, et al.
Published: (2025)
The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic
by: Sharma, Akash, et al.
Published: (2025)
by: Sharma, Akash, et al.
Published: (2025)
Grounding Descriptions in Images informs Zero-Shot Visual Recognition
by: Halbe, Shaunak, et al.
Published: (2024)
by: Halbe, Shaunak, et al.
Published: (2024)
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
by: Li, Haodong, et al.
Published: (2024)
by: Li, Haodong, et al.
Published: (2024)
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
by: Liao, Ning, et al.
Published: (2023)
by: Liao, Ning, et al.
Published: (2023)
Similar Items
-
Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection
by: VCR, Sairam, et al.
Published: (2025) -
C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
by: Rebbapragada, Sairam VC, et al.
Published: (2024) -
Understanding Task Transfer in Vision-Language Models
by: Sachdeva, Bhuvan, et al.
Published: (2025) -
Source-Free Domain Adaptation by Optimizing Batch-Wise Cosine Similarity
by: Pathak, Harsharaj, et al.
Published: (2026) -
$\oslash$ Source Models Leak What They Shouldn't $\nrightarrow$: Unlearning Zero-Shot Transfer in Domain Adaptation Through Adversarial Optimization
by: Devalapally, Arnav, et al.
Published: (2026)