:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Singh, Shivam, Chen, Yiming, Chatterjee, Agneet, Raj, Amit, Hays, James, Yang, Yezhou, Baral, Chitta
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.18083
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
by: Chatterjee, Agneet, et al.
Published: (2024)

AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
by: Malaviya, Vatsal, et al.
Published: (2025)

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024)

Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
by: Chatterjee, Agneet, et al.
Published: (2025)

TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
by: Fallah, Forouzan, et al.
Published: (2025)

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
by: Pathiraja, Bimsara, et al.
Published: (2025)

Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)

Dual Caption Preference Optimization for Diffusion Models
by: Saeidi, Amir, et al.
Published: (2025)

ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
by: Sampat, Shailaja Keyur, et al.
Published: (2024)

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023)

$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024)

Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
by: Sampat, Shailaja Keyur, et al.
Published: (2024)

Getting it Right: Improving Spatial Consistency in Text-to-Image Models
by: Chatterjee, Agneet, et al.
Published: (2024)

EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
by: Kusumba, Abhiram, et al.
Published: (2025)

Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
by: Luo, Yiran, et al.
Published: (2024)

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)

TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)

Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
by: Vani, Sameep, et al.
Published: (2025)

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
by: Sampat, Shailaja Keyur, et al.
Published: (2024)

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
by: Kim, Changhoon, et al.
Published: (2024)

TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
by: Feinglass, Joshua, et al.
Published: (2024)

GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
by: Siingh, Shikhhar, et al.
Published: (2025)

Personalized Residuals for Concept-Driven Text-to-Image Generation
by: Ham, Cusuh, et al.
Published: (2024)

Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)

MMTABREAL: Real-World Benchmark for Multimodal Table Understanding
by: Titiya, Prasham, et al.
Published: (2025)

The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
by: Anvekar, Tejas, et al.
Published: (2025)

eSkiTB: A Synthetic Event-based Dataset for Tracking Skiers
by: Vinod, Krishna, et al.
Published: (2026)

`Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning
by: Feinglass, Joshua, et al.
Published: (2024)

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
by: Patel, Maitreya, et al.
Published: (2024)

Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
by: Parmar, Mihir, et al.
Published: (2022)

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
by: Ding, Ganggui, et al.
Published: (2024)

VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations
by: Patel, Maitreya, et al.
Published: (2026)

HistoSPACE: Histology-Inspired Spatial Transcriptome Prediction And Characterization Engine
by: Kumar, Shivam, et al.
Published: (2024)

Clink! Chop! Thud! -- Learning Object Sounds from Real-World Interactions
by: Yang, Mengyu, et al.
Published: (2025)

Hi-Light: A Path to high-fidelity, high-resolution video relighting with a Novel Evaluation Paradigm
by: Liu, Xiangrui, et al.
Published: (2026)

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
by: Kim, Changhoon, et al.
Published: (2023)

Hyperspectral Image Analysis in Single-Modal and Multimodal setting using Deep Learning Techniques
by: Pande, Shivam
Published: (2024)

ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
by: Wu, Xindi, et al.
Published: (2024)

VOCAL: Visual Odometry via ContrAstive Learning
by: Huang, Chi-Yao, et al.
Published: (2025)