Saved in:
| Main Authors: | Sampat, Shailaja Keyur, Yang, Yezhou, Baral, Chitta |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.13662 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
by: Sampat, Shailaja Keyur, et al.
Published: (2024)
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023)
by: Patel, Maitreya, et al.
Published: (2023)
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
by: Malaviya, Vatsal, et al.
Published: (2025)
by: Malaviya, Vatsal, et al.
Published: (2025)
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
Chimera: Compositional Image Generation using Part-based Concepting
by: Singh, Shivam, et al.
Published: (2025)
by: Singh, Shivam, et al.
Published: (2025)
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
by: Pathiraja, Bimsara, et al.
Published: (2025)
by: Pathiraja, Bimsara, et al.
Published: (2025)
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
by: Kusumba, Abhiram, et al.
Published: (2025)
by: Kusumba, Abhiram, et al.
Published: (2025)
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
by: Luo, Yiran, et al.
Published: (2024)
by: Luo, Yiran, et al.
Published: (2024)
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
by: Vani, Sameep, et al.
Published: (2025)
by: Vani, Sameep, et al.
Published: (2025)
Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)
by: Liu, Xiangrui, et al.
Published: (2025)
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
by: Fallah, Forouzan, et al.
Published: (2025)
by: Fallah, Forouzan, et al.
Published: (2025)
Dual Caption Preference Optimization for Diffusion Models
by: Saeidi, Amir, et al.
Published: (2025)
by: Saeidi, Amir, et al.
Published: (2025)
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)
by: Yilmaz, Nilay, et al.
Published: (2025)
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition
by: Cao, Congqi, et al.
Published: (2024)
by: Cao, Congqi, et al.
Published: (2024)
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
by: Kim, Changhoon, et al.
Published: (2024)
by: Kim, Changhoon, et al.
Published: (2024)
Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
by: Chatterjee, Agneet, et al.
Published: (2025)
by: Chatterjee, Agneet, et al.
Published: (2025)
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
by: Feinglass, Joshua, et al.
Published: (2024)
by: Feinglass, Joshua, et al.
Published: (2024)
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
by: Siingh, Shikhhar, et al.
Published: (2025)
by: Siingh, Shikhhar, et al.
Published: (2025)
Zero-shot Compositional Action Recognition with Neural Logic Constraints
by: Ye, Gefan, et al.
Published: (2025)
by: Ye, Gefan, et al.
Published: (2025)
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
by: Chatterjee, Agneet, et al.
Published: (2024)
by: Chatterjee, Agneet, et al.
Published: (2024)
World Action Models are Zero-shot Policies
by: Ye, Seonghyeon, et al.
Published: (2026)
by: Ye, Seonghyeon, et al.
Published: (2026)
Interpretable Zero-shot Learning with Infinite Class Concepts
by: Ye, Zihan, et al.
Published: (2025)
by: Ye, Zihan, et al.
Published: (2025)
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
by: Aklilu, Josiah, et al.
Published: (2024)
by: Aklilu, Josiah, et al.
Published: (2024)
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
by: Zhu, Anqi, et al.
Published: (2024)
by: Zhu, Anqi, et al.
Published: (2024)
A Comprehensive Review of Few-shot Action Recognition
by: Wanyan, Yuyang, et al.
Published: (2024)
by: Wanyan, Yuyang, et al.
Published: (2024)
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
by: Cao, Congqi, et al.
Published: (2025)
by: Cao, Congqi, et al.
Published: (2025)
Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
by: Yu, Yating, et al.
Published: (2024)
by: Yu, Yating, et al.
Published: (2024)
Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes
by: Kim, Yehna, et al.
Published: (2025)
by: Kim, Yehna, et al.
Published: (2025)
Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
by: Wang, Shijian, et al.
Published: (2025)
by: Wang, Shijian, et al.
Published: (2025)
Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking
by: Sun, Shitong, et al.
Published: (2023)
by: Sun, Shitong, et al.
Published: (2023)
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
by: Bansal, Hritik, et al.
Published: (2025)
by: Bansal, Hritik, et al.
Published: (2025)
ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition
by: Kundu, Sanjoy, et al.
Published: (2024)
by: Kundu, Sanjoy, et al.
Published: (2024)
Appearance-free Action Recognition: Zero-shot Generalization in Humans and a Two-Pathway Model
by: Kumar, Prerana, et al.
Published: (2026)
by: Kumar, Prerana, et al.
Published: (2026)
Conceptrol: Concept Control of Zero-shot Personalized Image Generation
by: He, Qiyuan, et al.
Published: (2025)
by: He, Qiyuan, et al.
Published: (2025)
Similar Items
-
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
by: Sampat, Shailaja Keyur, et al.
Published: (2024) -
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
by: Sampat, Shailaja Keyur, et al.
Published: (2024) -
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023) -
AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
by: Malaviya, Vatsal, et al.
Published: (2025) -
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024)