Saved in:
| Main Authors: | Chatterjee, Agneet, Stan, Gabriela Ben Melech, Aflalo, Estelle, Paul, Sayak, Ghosh, Dhruba, Gokhale, Tejas, Schmidt, Ludwig, Hajishirzi, Hannaneh, Lal, Vasudev, Baral, Chitta, Yang, Yezhou |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.01197 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
Learning from Reasoning Failures via Synthetic Data Generation
by: Stan, Gabriela Ben Melech, et al.
Published: (2025)
by: Stan, Gabriela Ben Melech, et al.
Published: (2025)
FastRM: An efficient and automatic explainability framework for multimodal generative models
by: Stan, Gabriela Ben-Melech, et al.
Published: (2024)
by: Stan, Gabriela Ben-Melech, et al.
Published: (2024)
FiVL: A Framework for Improved Vision-Language Alignment through the Lens of Training, Evaluation and Explainability
by: Aflalo, Estelle, et al.
Published: (2024)
by: Aflalo, Estelle, et al.
Published: (2024)
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023)
by: Patel, Maitreya, et al.
Published: (2023)
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
by: Malaviya, Vatsal, et al.
Published: (2025)
by: Malaviya, Vatsal, et al.
Published: (2025)
Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
by: Fallah, Forouzan, et al.
Published: (2025)
by: Fallah, Forouzan, et al.
Published: (2025)
Dual Caption Preference Optimization for Diffusion Models
by: Saeidi, Amir, et al.
Published: (2025)
by: Saeidi, Amir, et al.
Published: (2025)
Chimera: Compositional Image Generation using Part-based Concepting
by: Singh, Shivam, et al.
Published: (2025)
by: Singh, Shivam, et al.
Published: (2025)
Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
by: Luo, Yiran, et al.
Published: (2024)
by: Luo, Yiran, et al.
Published: (2024)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)
by: Yilmaz, Nilay, et al.
Published: (2025)
A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment
by: Rohekar, Raanan Y., et al.
Published: (2024)
by: Rohekar, Raanan Y., et al.
Published: (2024)
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
by: Chatterjee, Agneet, et al.
Published: (2025)
by: Chatterjee, Agneet, et al.
Published: (2025)
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
by: Varshney, Neeraj, et al.
Published: (2024)
by: Varshney, Neeraj, et al.
Published: (2024)
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering
by: Ratzlaff, Neale, et al.
Published: (2024)
by: Ratzlaff, Neale, et al.
Published: (2024)
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models
by: Ghosh, Dhruba, et al.
Published: (2026)
by: Ghosh, Dhruba, et al.
Published: (2026)
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
by: Pathiraja, Bimsara, et al.
Published: (2025)
by: Pathiraja, Bimsara, et al.
Published: (2025)
L-MAGIC: Language Model Assisted Generation of Images with Coherence
by: Cai, Zhipeng, et al.
Published: (2024)
by: Cai, Zhipeng, et al.
Published: (2024)
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective
by: Rajput, Krishna Singh, et al.
Published: (2025)
by: Rajput, Krishna Singh, et al.
Published: (2025)
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
by: Vani, Sameep, et al.
Published: (2025)
by: Vani, Sameep, et al.
Published: (2025)
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
by: Kusumba, Abhiram, et al.
Published: (2025)
by: Kusumba, Abhiram, et al.
Published: (2025)
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
by: Zhao, Bowen, et al.
Published: (2024)
by: Zhao, Bowen, et al.
Published: (2024)
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
by: Anvekar, Tejas, et al.
Published: (2025)
by: Anvekar, Tejas, et al.
Published: (2025)
ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution
by: Alqurnawi, Yahia, et al.
Published: (2026)
by: Alqurnawi, Yahia, et al.
Published: (2026)
Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling
by: Saha, Sourajit, et al.
Published: (2024)
by: Saha, Sourajit, et al.
Published: (2024)
Data or Language Supervision: What Makes CLIP Better than DINO?
by: Liu, Yiming, et al.
Published: (2025)
by: Liu, Yiming, et al.
Published: (2025)
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
The EarlyBird Gets the WORM: Heuristically Accelerating EarlyBird Convergence
by: Vasudev, Adithya
Published: (2024)
by: Vasudev, Adithya
Published: (2024)
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
by: Madasu, Avinash, et al.
Published: (2023)
by: Madasu, Avinash, et al.
Published: (2023)
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
by: Merrill, William, et al.
Published: (2025)
by: Merrill, William, et al.
Published: (2025)
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
by: Kim, Joongwon, et al.
Published: (2024)
by: Kim, Joongwon, et al.
Published: (2024)
ScienceMeter: Tracking Scientific Knowledge Updates in Language Models
by: Wang, Yike, et al.
Published: (2025)
by: Wang, Yike, et al.
Published: (2025)
Similar Items
-
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
by: Chatterjee, Agneet, et al.
Published: (2024) -
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024) -
Learning from Reasoning Failures via Synthetic Data Generation
by: Stan, Gabriela Ben Melech, et al.
Published: (2025) -
FastRM: An efficient and automatic explainability framework for multimodal generative models
by: Stan, Gabriela Ben-Melech, et al.
Published: (2024) -
FiVL: A Framework for Improved Vision-Language Alignment through the Lens of Training, Evaluation and Explainability
by: Aflalo, Estelle, et al.
Published: (2024)