:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Vani, Sameep, Jena, Shreyas, Patel, Maitreya, Baral, Chitta, Aditya, Somak, Yang, Yezhou
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2510.03955
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
von: Patel, Maitreya, et al.
Veröffentlicht: (2024)

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
von: Patel, Maitreya, et al.
Veröffentlicht: (2023)

Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
von: Sampat, Shailaja Keyur, et al.
Veröffentlicht: (2024)

AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
von: Malaviya, Vatsal, et al.
Veröffentlicht: (2025)

RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
von: Pathiraja, Bimsara, et al.
Veröffentlicht: (2025)

TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
von: Patel, Maitreya, et al.
Veröffentlicht: (2024)

EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
von: Kusumba, Abhiram, et al.
Veröffentlicht: (2025)

TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
von: Fallah, Forouzan, et al.
Veröffentlicht: (2025)

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
von: Yilmaz, Nilay, et al.
Veröffentlicht: (2025)

ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
von: Sampat, Shailaja Keyur, et al.
Veröffentlicht: (2024)

Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model
von: Cheng, Sheng, et al.
Veröffentlicht: (2024)

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
von: Chatterjee, Agneet, et al.
Veröffentlicht: (2024)

Dual Caption Preference Optimization for Diffusion Models
von: Saeidi, Amir, et al.
Veröffentlicht: (2025)

Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images
von: Luo, Yiran, et al.
Veröffentlicht: (2024)

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
von: Chatterjee, Agneet, et al.
Veröffentlicht: (2024)

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
von: Kim, Changhoon, et al.
Veröffentlicht: (2023)

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
von: Patel, Maitreya, et al.
Veröffentlicht: (2024)

VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations
von: Patel, Maitreya, et al.
Veröffentlicht: (2026)

Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
von: Chatterjee, Agneet, et al.
Veröffentlicht: (2025)

Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
von: Liu, Xiangrui, et al.
Veröffentlicht: (2025)

Chimera: Compositional Image Generation using Part-based Concepting
von: Singh, Shivam, et al.
Veröffentlicht: (2025)

MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations
von: Yilmaz, Nilay, et al.
Veröffentlicht: (2026)

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
von: Sampat, Shailaja Keyur, et al.
Veröffentlicht: (2024)

MMTABREAL: Real-World Benchmark for Multimodal Table Understanding
von: Titiya, Prasham, et al.
Veröffentlicht: (2025)

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
von: Saxon, Michael, et al.
Veröffentlicht: (2024)

GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
von: Siingh, Shikhhar, et al.
Veröffentlicht: (2025)

Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
von: Rasekh, Ali, et al.
Veröffentlicht: (2025)

Getting it Right: Improving Spatial Consistency in Text-to-Image Models
von: Chatterjee, Agneet, et al.
Veröffentlicht: (2024)

The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
von: Anvekar, Tejas, et al.
Veröffentlicht: (2025)

SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras
von: Pahadia, Himanshu, et al.
Veröffentlicht: (2023)

Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
von: Chatterjee, Dibyadip, et al.
Veröffentlicht: (2025)

Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
von: Parmar, Mihir, et al.
Veröffentlicht: (2022)

CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models
von: Suin, Maitreya, et al.
Veröffentlicht: (2024)

SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring
von: Chanda, Kaustav, et al.
Veröffentlicht: (2025)

AD$^2$: Analysis and Detection of Adversarial Threats in Visual Perception for End-to-End Autonomous Driving Systems
von: Sahu, Ishan, et al.
Veröffentlicht: (2026)

SEE-DPO: Self Entropy Enhanced Direct Preference Optimization
von: Shekhar, Shivanshu, et al.
Veröffentlicht: (2024)

TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
von: Feinglass, Joshua, et al.
Veröffentlicht: (2024)

SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse
von: Sun, Yiming, et al.
Veröffentlicht: (2025)

IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence
von: Chandgothia, Shreyas, et al.
Veröffentlicht: (2024)

AutoFormBench: Benchmark Dataset for Automating Form Understanding
von: Baral, Gaurab, et al.
Veröffentlicht: (2026)