Saved in:
| Main Author: | Narang, Arya |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.11705 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Calorie Burn Estimation in Community Parks Through DLICP: A Mathematical Modelling Approach
by: Sebastian, Abhishek, et al.
Published: (2024)
by: Sebastian, Abhishek, et al.
Published: (2024)
QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
by: Sun, Wenfang, et al.
Published: (2024)
by: Sun, Wenfang, et al.
Published: (2024)
ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
by: Yanuka, Moran, et al.
Published: (2024)
by: Yanuka, Moran, et al.
Published: (2024)
Customizing Text-to-Image Models with a Single Image Pair
by: Jones, Maxwell, et al.
Published: (2024)
by: Jones, Maxwell, et al.
Published: (2024)
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
by: Role, François, et al.
Published: (2025)
by: Role, François, et al.
Published: (2025)
GeoFormer: A Vision and Sequence Transformer-based Approach for Greenhouse Gas Monitoring
by: Khirwar, Madhav, et al.
Published: (2024)
by: Khirwar, Madhav, et al.
Published: (2024)
Quilt-1M: One Million Image-Text Pairs for Histopathology
by: Ikezogwo, Wisdom Oluchi, et al.
Published: (2023)
by: Ikezogwo, Wisdom Oluchi, et al.
Published: (2023)
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
by: Franchi, Gianni, et al.
Published: (2024)
by: Franchi, Gianni, et al.
Published: (2024)
QUTE: Quantifying Uncertainty in TinyML with Early-exit-assisted ensembles for model-monitoring
by: Ghanathe, Nikhil P, et al.
Published: (2024)
by: Ghanathe, Nikhil P, et al.
Published: (2024)
PMPGuard: Catching Pseudo-Matched Pairs in Remote Sensing Image-Text Retrieval
by: Ouyang, Pengxiang, et al.
Published: (2025)
by: Ouyang, Pengxiang, et al.
Published: (2025)
Vision Learners Meet Web Image-Text Pairs
by: Zhao, Bingchen, et al.
Published: (2023)
by: Zhao, Bingchen, et al.
Published: (2023)
Naïve PAINE: Lightweight Text-to-Image Generation Improvement with Prompt Evaluation
by: Kim, Joong Ho, et al.
Published: (2026)
by: Kim, Joong Ho, et al.
Published: (2026)
CLIPTime: Time-Aware Multimodal Representation Learning from Images and Text
by: Rani, Anju, et al.
Published: (2025)
by: Rani, Anju, et al.
Published: (2025)
The Narrow Gate: Localized Image-Text Communication in Native Multimodal Models
by: Serra, Alessandro Pietro, et al.
Published: (2024)
by: Serra, Alessandro Pietro, et al.
Published: (2024)
A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
by: Khan, Asifullah, et al.
Published: (2025)
by: Khan, Asifullah, et al.
Published: (2025)
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
by: Mao, Jiawei, et al.
Published: (2025)
by: Mao, Jiawei, et al.
Published: (2025)
Learning an Image Editing Model without Image Editing Pairs
by: Kumari, Nupur, et al.
Published: (2025)
by: Kumari, Nupur, et al.
Published: (2025)
Advanced Multimodal Deep Learning Architecture for Image-Text Matching
by: Wang, Jinyin, et al.
Published: (2024)
by: Wang, Jinyin, et al.
Published: (2024)
MultiFusionNet: Multilayer Multimodal Fusion of Deep Neural Networks for Chest X-Ray Image Classification
by: Agarwal, Saurabh, et al.
Published: (2024)
by: Agarwal, Saurabh, et al.
Published: (2024)
ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models
by: Villegas, Danae Sánchez, et al.
Published: (2025)
by: Villegas, Danae Sánchez, et al.
Published: (2025)
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
by: Arazi, Alan, et al.
Published: (2026)
by: Arazi, Alan, et al.
Published: (2026)
Efficient All-Pairs Correlation Volume Sampling for Optical Flow Estimation
by: Briedis, Karlis Martins, et al.
Published: (2025)
by: Briedis, Karlis Martins, et al.
Published: (2025)
FairJudge: Abstention-Aware Multimodal Judges for Fairness and Alignment Evaluation in Text-to-Image Models
by: Sahili, Zahraa Al, et al.
Published: (2025)
by: Sahili, Zahraa Al, et al.
Published: (2025)
Balancing the Scales: Enhancing Fairness in Facial Expression Recognition with Latent Alignment
by: Rizvi, Syed Sameen Ahmad, et al.
Published: (2024)
by: Rizvi, Syed Sameen Ahmad, et al.
Published: (2024)
Demand Estimation with Text and Image Data
by: Compiani, Giovanni, et al.
Published: (2025)
by: Compiani, Giovanni, et al.
Published: (2025)
Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion
by: Haviv, Adi, et al.
Published: (2024)
by: Haviv, Adi, et al.
Published: (2024)
Text-Guided Image Clustering
by: Stephan, Andreas, et al.
Published: (2024)
by: Stephan, Andreas, et al.
Published: (2024)
Hyperbolic Image-Text Representations
by: Desai, Karan, et al.
Published: (2023)
by: Desai, Karan, et al.
Published: (2023)
Learning Hyperspectral Images with Curated Text Prompts for Efficient Multimodal Alignment
by: Chatterjee, Abhiroop, et al.
Published: (2025)
by: Chatterjee, Abhiroop, et al.
Published: (2025)
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
by: Yang, Ling, et al.
Published: (2024)
by: Yang, Ling, et al.
Published: (2024)
DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models
by: Kim, Sungnyun, et al.
Published: (2023)
by: Kim, Sungnyun, et al.
Published: (2023)
Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification
by: Djajapermana, Mikhael, et al.
Published: (2025)
by: Djajapermana, Mikhael, et al.
Published: (2025)
Gems: Group Emotion Profiling Through Multimodal Situational Understanding
by: Kataria, Anubhav, et al.
Published: (2025)
by: Kataria, Anubhav, et al.
Published: (2025)
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
by: Smart, Brandon, et al.
Published: (2024)
by: Smart, Brandon, et al.
Published: (2024)
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
by: Chen, Zhaorun, et al.
Published: (2024)
by: Chen, Zhaorun, et al.
Published: (2024)
Text-Guided Alternative Image Clustering
by: Stephan, Andreas, et al.
Published: (2024)
by: Stephan, Andreas, et al.
Published: (2024)
Improved Probabilistic Image-Text Representations
by: Chun, Sanghyuk
Published: (2023)
by: Chun, Sanghyuk
Published: (2023)
Information Theoretic Text-to-Image Alignment
by: Wang, Chao, et al.
Published: (2024)
by: Wang, Chao, et al.
Published: (2024)
Privacy-Preserving in Connected and Autonomous Vehicles Through Vision to Text Transformation
by: Rezaei, Abdolazim, et al.
Published: (2025)
by: Rezaei, Abdolazim, et al.
Published: (2025)
Similar Items
-
Calorie Burn Estimation in Community Parks Through DLICP: A Mathematical Modelling Approach
by: Sebastian, Abhishek, et al.
Published: (2024) -
QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
by: Sun, Wenfang, et al.
Published: (2024) -
ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
by: Yanuka, Moran, et al.
Published: (2024) -
Customizing Text-to-Image Models with a Single Image Pair
by: Jones, Maxwell, et al.
Published: (2024) -
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
by: Role, François, et al.
Published: (2025)