Saved in:
| Main Authors: | Yilmaz, Nilay, Patel, Maitreya, Luo, Yiran Lawrence, Gokhale, Tejas, Baral, Chitta, Jayasuriya, Suren, Yang, Yezhou |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.00043 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023)
by: Patel, Maitreya, et al.
Published: (2023)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
by: Fallah, Forouzan, et al.
Published: (2025)
by: Fallah, Forouzan, et al.
Published: (2025)
Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
by: Luo, Yiran, et al.
Published: (2024)
by: Luo, Yiran, et al.
Published: (2024)
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
by: Malaviya, Vatsal, et al.
Published: (2025)
by: Malaviya, Vatsal, et al.
Published: (2025)
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
by: Vani, Sameep, et al.
Published: (2025)
by: Vani, Sameep, et al.
Published: (2025)
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
by: Pathiraja, Bimsara, et al.
Published: (2025)
by: Pathiraja, Bimsara, et al.
Published: (2025)
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
by: Cheng, Sheng, et al.
Published: (2024)
by: Cheng, Sheng, et al.
Published: (2024)
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
by: Anvekar, Tejas, et al.
Published: (2025)
by: Anvekar, Tejas, et al.
Published: (2025)
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations
by: Yilmaz, Nilay, et al.
Published: (2026)
by: Yilmaz, Nilay, et al.
Published: (2026)
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
by: Kusumba, Abhiram, et al.
Published: (2025)
by: Kusumba, Abhiram, et al.
Published: (2025)
Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
by: Siingh, Shikhhar, et al.
Published: (2025)
by: Siingh, Shikhhar, et al.
Published: (2025)
Dual Caption Preference Optimization for Diffusion Models
by: Saeidi, Amir, et al.
Published: (2025)
by: Saeidi, Amir, et al.
Published: (2025)
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective
by: Rajput, Krishna Singh, et al.
Published: (2025)
by: Rajput, Krishna Singh, et al.
Published: (2025)
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution
by: Alqurnawi, Yahia, et al.
Published: (2026)
by: Alqurnawi, Yahia, et al.
Published: (2026)
Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
by: Kim, Changhoon, et al.
Published: (2023)
by: Kim, Changhoon, et al.
Published: (2023)
Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
by: Parmar, Mihir, et al.
Published: (2022)
by: Parmar, Mihir, et al.
Published: (2022)
VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations
by: Patel, Maitreya, et al.
Published: (2026)
by: Patel, Maitreya, et al.
Published: (2026)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
by: Parmar, Mihir, et al.
Published: (2024)
by: Parmar, Mihir, et al.
Published: (2024)
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
by: Feinglass, Joshua, et al.
Published: (2024)
by: Feinglass, Joshua, et al.
Published: (2024)
Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
by: Patel, Nisarg, et al.
Published: (2024)
by: Patel, Nisarg, et al.
Published: (2024)
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
by: Yeh, Chun-Hsiao, et al.
Published: (2025)
by: Yeh, Chun-Hsiao, et al.
Published: (2025)
Chimera: Compositional Image Generation using Part-based Concepting
by: Singh, Shivam, et al.
Published: (2025)
by: Singh, Shivam, et al.
Published: (2025)
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
by: Chatterjee, Agneet, et al.
Published: (2025)
by: Chatterjee, Agneet, et al.
Published: (2025)
SH-SAS: An Implicit Neural Representation for Complex Spherical-Harmonic Scattering Fields for 3D Synthetic Aperture Sonar
by: Vengurlekar, Omkar Shailendra, et al.
Published: (2025)
by: Vengurlekar, Omkar Shailendra, et al.
Published: (2025)
PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot
by: Kannapiran, Shenbagaraj, et al.
Published: (2024)
by: Kannapiran, Shenbagaraj, et al.
Published: (2024)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness
by: Varshney, Neeraj, et al.
Published: (2023)
by: Varshney, Neeraj, et al.
Published: (2023)
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)
by: Lu, Yujie, et al.
Published: (2024)
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks
by: Saeidi, Amir, et al.
Published: (2024)
by: Saeidi, Amir, et al.
Published: (2024)
Interleaved Latent Visual Reasoning with Selective Perceptual Modeling
by: Dong, Shuai, et al.
Published: (2025)
by: Dong, Shuai, et al.
Published: (2025)
Similar Items
-
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023) -
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024) -
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024) -
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024) -
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
by: Chatterjee, Agneet, et al.
Published: (2024)