:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Narang, Arya
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.11705
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Calorie Burn Estimation in Community Parks Through DLICP: A Mathematical Modelling Approach
by: Sebastian, Abhishek, et al.
Published: (2024)

QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
by: Sun, Wenfang, et al.
Published: (2024)

ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
by: Yanuka, Moran, et al.
Published: (2024)

Customizing Text-to-Image Models with a Single Image Pair
by: Jones, Maxwell, et al.
Published: (2024)

Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
by: Role, François, et al.
Published: (2025)

GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring
by: Khirwar, Madhav, et al.
Published: (2024)

Quilt-1M: One Million Image-Text Pairs for Histopathology
by: Ikezogwo, Wisdom Oluchi, et al.
Published: (2023)

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
by: Franchi, Gianni, et al.
Published: (2024)

QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring
by: Ghanathe, Nikhil P, et al.
Published: (2024)

PMPGuard: Catching Pseudo-Matched Pairs in Remote Sensing Image-Text Retrieval
by: Ouyang, Pengxiang, et al.
Published: (2025)

Vision Learners Meet Web Image-Text Pairs
by: Zhao, Bingchen, et al.
Published: (2023)

Naïve PAINE: Lightweight Text-to-Image Generation Improvement with Prompt Evaluation
by: Kim, Joong Ho, et al.
Published: (2026)

CLIPTime: Time-Aware Multimodal Representation Learning from Images and Text
by: Rani, Anju, et al.
Published: (2025)

The Narrow Gate: Localized Image-Text Communication in Native Multimodal Models
by: Serra, Alessandro Pietro, et al.
Published: (2024)

A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
by: Khan, Asifullah, et al.
Published: (2025)

RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
by: Wang, Chao, et al.
Published: (2025)

MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
by: Mao, Jiawei, et al.
Published: (2025)

Learning an Image Editing Model without Image Editing Pairs
by: Kumari, Nupur, et al.
Published: (2025)

Advanced Multimodal Deep Learning Architecture for Image-Text Matching
by: Wang, Jinyin, et al.
Published: (2024)

MultiFusionNet: Multilayer Multimodal Fusion of Deep Neural Networks for Chest X-Ray Image Classification
by: Agarwal, Saurabh, et al.
Published: (2024)

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
by: Villegas, Danae Sánchez, et al.
Published: (2025)

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
by: Arazi, Alan, et al.
Published: (2026)

Efficient All-Pairs Correlation Volume Sampling for Optical Flow Estimation
by: Briedis, Karlis Martins, et al.
Published: (2025)

FairJudge: Abstention-Aware Multimodal Judges for Fairness and Alignment Evaluation in Text-to-Image Models
by: Sahili, Zahraa Al, et al.
Published: (2025)

Balancing the Scales: Enhancing Fairness in Facial Expression Recognition with Latent Alignment
by: Rizvi, Syed Sameen Ahmad, et al.
Published: (2024)

Demand Estimation with Text and Image Data
by: Compiani, Giovanni, et al.
Published: (2025)

Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion
by: Haviv, Adi, et al.
Published: (2024)

Text-Guided Image Clustering
by: Stephan, Andreas, et al.
Published: (2024)

Hyperbolic Image-Text Representations
by: Desai, Karan, et al.
Published: (2023)

Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
by: Chatterjee, Abhiroop, et al.
Published: (2025)

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
by: Yang, Ling, et al.
Published: (2024)

DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models
by: Kim, Sungnyun, et al.
Published: (2023)

Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification
by: Djajapermana, Mikhael, et al.
Published: (2025)

Gems: Group Emotion Profiling Through Multimodal Situational Understanding
by: Kataria, Anubhav, et al.
Published: (2025)

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
by: Smart, Brandon, et al.
Published: (2024)

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)

Text-Guided Alternative Image Clustering
by: Stephan, Andreas, et al.
Published: (2024)

Improved Probabilistic Image-Text Representations
by: Chun, Sanghyuk
Published: (2023)

Information Theoretic Text-to-Image Alignment
by: Wang, Chao, et al.
Published: (2024)

Privacy-Preserving in Connected and Autonomous Vehicles Through Vision to Text Transformation
by: Rezaei, Abdolazim, et al.
Published: (2025)