Saved in:
| Main Authors: | Addepalli, Sravanti, Asokan, Ashish Ramayee, Sharma, Lakshay, Babu, R. Venkatesh |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.08255 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations
by: Addepalli, Sravanti, et al.
Published: (2024)
by: Addepalli, Sravanti, et al.
Published: (2024)
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
by: Rangwani, Harsh, et al.
Published: (2024)
by: Rangwani, Harsh, et al.
Published: (2024)
Do Vision Language Models Need to Process Image Tokens?
by: Ghosh, Sambit, et al.
Published: (2026)
by: Ghosh, Sambit, et al.
Published: (2026)
Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification
by: Basu, Abhipsa, et al.
Published: (2025)
by: Basu, Abhipsa, et al.
Published: (2025)
Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models
by: Corley, Isaac, et al.
Published: (2025)
by: Corley, Isaac, et al.
Published: (2025)
Compass Control: Multi Object Orientation Control for Text-to-Image Generation
by: Parihar, Rishubh, et al.
Published: (2025)
by: Parihar, Rishubh, et al.
Published: (2025)
Subimage Overlap Prediction: Task-Aligned Self-Supervised Pretraining For Semantic Segmentation In Remote Sensing Imagery
by: Sharma, Lakshay, et al.
Published: (2026)
by: Sharma, Lakshay, et al.
Published: (2026)
F4-ITS: Fine-grained Feature Fusion for Food Image-Text Search
by: Asokan, Raghul
Published: (2025)
by: Asokan, Raghul
Published: (2025)
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
by: Dhiman, Ankit, et al.
Published: (2025)
by: Dhiman, Ankit, et al.
Published: (2025)
SpectraIrisPAD: Leveraging Vision Foundation Models for Spectrally Conditioned Multispectral Iris Presentation Attack Detection
by: Ramachandra, Raghavendra, et al.
Published: (2025)
by: Ramachandra, Raghavendra, et al.
Published: (2025)
GeoDiv: Framework For Measuring Geographical Diversity In Text-To-Image Models
by: Basu, Abhipsa, et al.
Published: (2026)
by: Basu, Abhipsa, et al.
Published: (2026)
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments
by: Nagaonkar, Sankalp, et al.
Published: (2025)
by: Nagaonkar, Sankalp, et al.
Published: (2025)
PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
by: Parihar, Rishubh, et al.
Published: (2024)
by: Parihar, Rishubh, et al.
Published: (2024)
Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
by: Sharma, Shivam, et al.
Published: (2026)
by: Sharma, Shivam, et al.
Published: (2026)
Glo-VLMs: Leveraging Vision-Language Models for Fine-Grained Diseased Glomerulus Classification
by: Guo, Zhenhao, et al.
Published: (2025)
by: Guo, Zhenhao, et al.
Published: (2025)
Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization
by: Bharadwaj, Siddhant, et al.
Published: (2026)
by: Bharadwaj, Siddhant, et al.
Published: (2026)
Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now
by: Thozhiyoor, Varun Varma, et al.
Published: (2025)
by: Thozhiyoor, Varun Varma, et al.
Published: (2025)
SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation
by: Agrawal, Vaibhav, et al.
Published: (2026)
by: Agrawal, Vaibhav, et al.
Published: (2026)
From Classification to Cross-Modal Understanding: Leveraging Vision-Language Models for Fine-Grained Renal Pathology
by: Guo, Zhenhao, et al.
Published: (2025)
by: Guo, Zhenhao, et al.
Published: (2025)
Where Do Images Come From? Analyzing Captions to Geographically Profile Datasets
by: Basu, Abhipsa, et al.
Published: (2026)
by: Basu, Abhipsa, et al.
Published: (2026)
Large Language Models Facilitate Vision Reflection in Image Classification
by: An, Guoyuan, et al.
Published: (2025)
by: An, Guoyuan, et al.
Published: (2025)
Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
by: Kumar, Amar, et al.
Published: (2025)
by: Kumar, Amar, et al.
Published: (2025)
Text2Place: Affordance-aware Text Guided Human Placement
by: Parihar, Rishubh, et al.
Published: (2024)
by: Parihar, Rishubh, et al.
Published: (2024)
Improved EATFormer: A Vision Transformer for Medical Image Classification
by: Shisu, Yulong, et al.
Published: (2024)
by: Shisu, Yulong, et al.
Published: (2024)
CLIP-HandID: Vision-Language Model for Hand-Based Person Identification
by: Baisa, Nathanael L., et al.
Published: (2025)
by: Baisa, Nathanael L., et al.
Published: (2025)
A Novel Vision Transformer with Residual in Self-attention for Biomedical Image Classification
by: Sharma, Arun K., et al.
Published: (2023)
by: Sharma, Arun K., et al.
Published: (2023)
Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models
by: Sharma, Pranav, et al.
Published: (2025)
by: Sharma, Pranav, et al.
Published: (2025)
Espresso: Robust Concept Filtering in Text-to-Image Models
by: Das, Anudeep, et al.
Published: (2024)
by: Das, Anudeep, et al.
Published: (2024)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
by: Peng, Wenshuo, et al.
Published: (2024)
by: Peng, Wenshuo, et al.
Published: (2024)
Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation
by: Yu, Wenjun, et al.
Published: (2025)
by: Yu, Wenjun, et al.
Published: (2025)
Training-free Conditional Image Embedding Framework Leveraging Large Vision Language Models
by: Kawarada, Masayuki, et al.
Published: (2025)
by: Kawarada, Masayuki, et al.
Published: (2025)
Leveraging SAM for Single-Source Domain Generalization in Medical Image Segmentation
by: Wang, Hanhui, et al.
Published: (2024)
by: Wang, Hanhui, et al.
Published: (2024)
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
by: Dhiman, Ankit, et al.
Published: (2024)
by: Dhiman, Ankit, et al.
Published: (2024)
Image Synthesis Using Spintronic Deep Convolutional Generative Adversarial Network
by: Gupta, Saumya, et al.
Published: (2026)
by: Gupta, Saumya, et al.
Published: (2026)
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
by: Rahaman, Md Mamunur, et al.
Published: (2025)
by: Rahaman, Md Mamunur, et al.
Published: (2025)
Frequency-Aware Vision-Language Multimodality Generalization Network for Remote Sensing Image Classification
by: Zhang, Junjie, et al.
Published: (2025)
by: Zhang, Junjie, et al.
Published: (2025)
Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation
by: Chen, Siyu, et al.
Published: (2025)
by: Chen, Siyu, et al.
Published: (2025)
Rule-Based Reinforcement Learning for Document Image Classification with Vision Language Models
by: Jungo, Michael, et al.
Published: (2025)
by: Jungo, Michael, et al.
Published: (2025)
ChromaDistill: Colorizing Monochrome Radiance Fields with Knowledge Distillation
by: Dhiman, Ankit, et al.
Published: (2023)
by: Dhiman, Ankit, et al.
Published: (2023)
In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification
by: Dimitrovski, Ivica, et al.
Published: (2023)
by: Dimitrovski, Ivica, et al.
Published: (2023)
Similar Items
-
ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations
by: Addepalli, Sravanti, et al.
Published: (2024) -
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
by: Rangwani, Harsh, et al.
Published: (2024) -
Do Vision Language Models Need to Process Image Tokens?
by: Ghosh, Sambit, et al.
Published: (2026) -
Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification
by: Basu, Abhipsa, et al.
Published: (2025) -
Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models
by: Corley, Isaac, et al.
Published: (2025)