:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Luo, Yucong, Cheng, Mingyue, Ouyang, Jie, Tao, Xiaoyu, Liu, Qi
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2412.18185
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
von: Lu, Xiaoxin, et al.
Veröffentlicht: (2025)

Towards Deconfounded Image-Text Matching with Causal Inference
von: Li, Wenhui, et al.
Veröffentlicht: (2024)

Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning
von: Zhang, Yu, et al.
Veröffentlicht: (2025)

Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation
von: Che, Chang, et al.
Veröffentlicht: (2024)

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
von: Jiang, Dongzhi, et al.
Veröffentlicht: (2024)

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
von: Yu, Ya-Qi, et al.
Veröffentlicht: (2024)

TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval
von: Lyu, Shuai, et al.
Veröffentlicht: (2025)

Uncovering the Text Embedding in Text-to-Image Diffusion Models
von: Yu, Hu, et al.
Veröffentlicht: (2024)

A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking
von: Song, Shezheng, et al.
Veröffentlicht: (2023)

15M Multimodal Facial Image-Text Dataset
von: Dai, Dawei, et al.
Veröffentlicht: (2024)

Clustering-based Image-Text Graph Matching for Domain Generalization
von: Park, Nokyung, et al.
Veröffentlicht: (2023)

MULTI: Multimodal Understanding Leaderboard with Text and Images
von: Zhu, Zichen, et al.
Veröffentlicht: (2024)

TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis
von: Rahman, Kazi Mahathir, et al.
Veröffentlicht: (2025)

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
von: Lakhanpal, Sanyam, et al.
Veröffentlicht: (2024)

MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
von: Berman, William, et al.
Veröffentlicht: (2024)

Policy Optimized Text-to-Image Pipeline Design
von: Gadot, Uri, et al.
Veröffentlicht: (2025)

Dynamic Prompt Optimizing for Text-to-Image Generation
von: Mo, Wenyi, et al.
Veröffentlicht: (2024)

Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion
von: Lv, Zheqi, et al.
Veröffentlicht: (2025)

MASTER: Multimodal Segmentation with Text Prompts
von: Liu, Fuyang, et al.
Veröffentlicht: (2025)

ComCLIP: Training-Free Compositional Image and Text Matching
von: Jiang, Kenan, et al.
Veröffentlicht: (2022)

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
von: Zhou, Sashuai, et al.
Veröffentlicht: (2026)

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
von: Liu, Tao, et al.
Veröffentlicht: (2025)

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
von: Chen, Weifeng, et al.
Veröffentlicht: (2024)

Edge Approximation Text Detector
von: Yang, Chuang, et al.
Veröffentlicht: (2025)

Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs
von: Xian, Jia Jun Cheng, et al.
Veröffentlicht: (2025)

Training-Free Consistent Text-to-Image Generation
von: Tewel, Yoad, et al.
Veröffentlicht: (2024)

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis
von: Yang, Xinrui, et al.
Veröffentlicht: (2024)

Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation
von: Dai, Dawei, et al.
Veröffentlicht: (2025)

Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
von: Zhu, Zhiyu, et al.
Veröffentlicht: (2025)

Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
von: Peng, Xingkai, et al.
Veröffentlicht: (2025)

IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment
von: Sun, Shangkun, et al.
Veröffentlicht: (2025)

Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs
von: Lin, Chenchen, et al.
Veröffentlicht: (2026)

FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models
von: Luo, Hanjun, et al.
Veröffentlicht: (2024)

ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
von: Zhang, Leixin, et al.
Veröffentlicht: (2024)

A Comprehensive Survey on Concept Erasure in Text-to-Image Diffusion Models
von: Kim, Changhoon, et al.
Veröffentlicht: (2025)

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation
von: Wen, Yuxin, et al.
Veröffentlicht: (2025)

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
von: Zhao, Yu, et al.
Veröffentlicht: (2024)

Text-only Synthesis for Image Captioning
von: Zhou, Qing, et al.
Veröffentlicht: (2024)

Efficient Text-driven Motion Generation via Latent Consistency Training
von: Hu, Mengxian, et al.
Veröffentlicht: (2024)

StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization
von: Gaur, Gopalji, et al.
Veröffentlicht: (2025)