:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sampat, Shailaja Keyur, Yang, Yezhou, Baral, Chitta
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.13662
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
by: Sampat, Shailaja Keyur, et al.
Published: (2024)

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
by: Sampat, Shailaja Keyur, et al.
Published: (2024)

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
by: Patel, Maitreya, et al.
Published: (2023)

AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
by: Malaviya, Vatsal, et al.
Published: (2025)

$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
by: Patel, Maitreya, et al.
Published: (2024)

Chimera: Compositional Image Generation using Part-based Concepting
by: Singh, Shivam, et al.
Published: (2025)

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
by: Pathiraja, Bimsara, et al.
Published: (2025)

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
by: Chatterjee, Agneet, et al.
Published: (2024)

EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
by: Kusumba, Abhiram, et al.
Published: (2025)

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
by: Chatterjee, Agneet, et al.
Published: (2024)

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)

Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
by: Luo, Yiran, et al.
Published: (2024)

Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
by: Vani, Sameep, et al.
Published: (2025)

Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
by: Liu, Xiangrui, et al.
Published: (2025)

TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
by: Fallah, Forouzan, et al.
Published: (2025)

Dual Caption Preference Optimization for Diffusion Models
by: Saeidi, Amir, et al.
Published: (2025)

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)

TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
by: Yilmaz, Nilay, et al.
Published: (2025)

Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition
by: Cao, Congqi, et al.
Published: (2024)

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
by: Kim, Changhoon, et al.
Published: (2024)

Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
by: Chatterjee, Agneet, et al.
Published: (2025)

TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
by: Feinglass, Joshua, et al.
Published: (2024)

GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
by: Siingh, Shikhhar, et al.
Published: (2025)

Zero-shot Compositional Action Recognition with Neural Logic Constraints
by: Ye, Gefan, et al.
Published: (2025)

Getting it Right: Improving Spatial Consistency in Text-to-Image Models
by: Chatterjee, Agneet, et al.
Published: (2024)

World Action Models are Zero-shot Policies
by: Ye, Seonghyeon, et al.
Published: (2026)

Interpretable Zero-shot Learning with Infinite Class Concepts
by: Ye, Zihan, et al.
Published: (2025)

Zero-shot Action Localization via the Confidence of Large Vision-Language Models
by: Aklilu, Josiah, et al.
Published: (2024)

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
by: Zhu, Anqi, et al.
Published: (2024)

A Comprehensive Review of Few-shot Action Recognition
by: Wanyan, Yuyang, et al.
Published: (2024)

Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
by: Cao, Congqi, et al.
Published: (2025)

Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
by: Yu, Yating, et al.
Published: (2024)

Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes
by: Kim, Yehna, et al.
Published: (2025)

Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
by: Wang, Shijian, et al.
Published: (2025)

Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking
by: Sun, Shitong, et al.
Published: (2023)

VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
by: Bansal, Hritik, et al.
Published: (2025)

ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition
by: Kundu, Sanjoy, et al.
Published: (2024)

Appearance-free Action Recognition: Zero-shot Generalization in Humans and a Two-Pathway Model
by: Kumar, Prerana, et al.
Published: (2026)

Conceptrol: Concept Control of Zero-shot Personalized Image Generation
by: He, Qiyuan, et al.
Published: (2025)