:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hahn, Meera, Zeng, Wenjun, Kannen, Nithish, Galt, Rich, Badola, Kartikeya, Kim, Been, Wang, Zi
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2412.06771
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Aesthetics: Cultural Competence in Text-to-Image Models
by: Kannen, Nithish, et al.
Published: (2024)

Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits
by: Kalibhat, Neha, et al.
Published: (2026)

Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing
by: Ma, Shichao, et al.
Published: (2025)

Alchemist: Turning Public Text-to-Image Data into Generative Gold
by: Startsev, Valerii, et al.
Published: (2025)

Learning Complex Non-Rigid Image Edits from Multimodal Conditioning
by: Warner, Nikolai, et al.
Published: (2024)

Instilling Multi-round Thinking to Text-guided Image Generation
by: Zeng, Lidong, et al.
Published: (2024)

MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
by: Wang, Yueqian, et al.
Published: (2025)

UCMNet: Uncertainty-Aware Context Memory Network for Under-Display Camera Image Restoration
by: Kim, Daehyun, et al.
Published: (2026)

Generation Navigator: A State-Aware Agentic Framework for Image Generation
by: Liu, Jinming, et al.
Published: (2026)

Image Generators are Generalist Vision Learners
by: Gabeur, Valentin, et al.
Published: (2026)

ConceptGuard: Proactive Safety in Text-and-Image-to-Video Generation through Multimodal Risk Detection
by: Ma, Ruize, et al.
Published: (2025)

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
by: Li, Mingcheng, et al.
Published: (2025)

TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation
by: Park, NaHyeon, et al.
Published: (2024)

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
by: Zhao, Shihao, et al.
Published: (2024)

Flow of Truth: Proactive Temporal Forensics for Image-to-Video Generation
by: Chen, Yuzhuo, et al.
Published: (2026)

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
by: Chen, Yiyang, et al.
Published: (2022)

Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models
by: Yoon, Lauren Hyoseo, et al.
Published: (2025)

M3: High-fidelity Text-to-Image Generation via Multi-Modal, Multi-Agent and Multi-Round Visual Reasoning
by: Yang, Bangji, et al.
Published: (2026)

MEVG: Multi-event Video Generation with Text-to-Video Models
by: Oh, Gyeongrok, et al.
Published: (2023)

Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
by: Ma, Zehong, et al.
Published: (2025)

Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval
by: Li, Wenjun, et al.
Published: (2024)

NSFW-Classifier Guided Prompt Sanitization for Safe Text-to-Image Generation
by: Xie, Yu, et al.
Published: (2025)

OSPO: Object-Centric Self-Improving Preference Optimization for Text-to-Image Generation
by: Oh, Yoonjin, et al.
Published: (2025)

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
by: Lyu, Qiang, et al.
Published: (2025)

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
by: Franchi, Gianni, et al.
Published: (2024)

Generative Recall, Dense Reranking: Learning Multi-View Semantic IDs for Efficient Text-to-Video Retrieval
by: Zhao, Zecheng, et al.
Published: (2026)

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
by: Huang, Kaiyi, et al.
Published: (2024)

Conditional Text-to-Image Generation with Reference Guidance
by: Kim, Taewook, et al.
Published: (2024)

RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection
by: Lee, Junhee, et al.
Published: (2025)

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
by: Wang, Yunnan, et al.
Published: (2024)

FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation
by: Woo, Young Beom, et al.
Published: (2025)

DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection
by: Zaman, Sayeem Been, et al.
Published: (2025)

Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
by: Decatur, Dale, et al.
Published: (2025)

Learning Multi-dimensional Human Preference for Text-to-Image Generation
by: Zhang, Sixian, et al.
Published: (2024)

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
by: Zhou, Dewei, et al.
Published: (2024)

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
by: Zeng, Guanning, et al.
Published: (2025)

MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation
by: Oshima, Yuta, et al.
Published: (2025)

Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents
by: Song, Yurun, et al.
Published: (2026)

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
by: Lee, Seung Hyun, et al.
Published: (2024)