Saved in:
| Main Authors: | Hahn, Meera, Zeng, Wenjun, Kannen, Nithish, Galt, Rich, Badola, Kartikeya, Kim, Been, Wang, Zi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.06771 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Aesthetics: Cultural Competence in Text-to-Image Models
by: Kannen, Nithish, et al.
Published: (2024)
by: Kannen, Nithish, et al.
Published: (2024)
Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits
by: Kalibhat, Neha, et al.
Published: (2026)
by: Kalibhat, Neha, et al.
Published: (2026)
Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing
by: Ma, Shichao, et al.
Published: (2025)
by: Ma, Shichao, et al.
Published: (2025)
Alchemist: Turning Public Text-to-Image Data into Generative Gold
by: Startsev, Valerii, et al.
Published: (2025)
by: Startsev, Valerii, et al.
Published: (2025)
Learning Complex Non-Rigid Image Edits from Multimodal Conditioning
by: Warner, Nikolai, et al.
Published: (2024)
by: Warner, Nikolai, et al.
Published: (2024)
Instilling Multi-round Thinking to Text-guided Image Generation
by: Zeng, Lidong, et al.
Published: (2024)
by: Zeng, Lidong, et al.
Published: (2024)
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
by: Wang, Yueqian, et al.
Published: (2025)
by: Wang, Yueqian, et al.
Published: (2025)
UCMNet: Uncertainty-Aware Context Memory Network for Under-Display Camera Image Restoration
by: Kim, Daehyun, et al.
Published: (2026)
by: Kim, Daehyun, et al.
Published: (2026)
Generation Navigator: A State-Aware Agentic Framework for Image Generation
by: Liu, Jinming, et al.
Published: (2026)
by: Liu, Jinming, et al.
Published: (2026)
Image Generators are Generalist Vision Learners
by: Gabeur, Valentin, et al.
Published: (2026)
by: Gabeur, Valentin, et al.
Published: (2026)
ConceptGuard: Proactive Safety in Text-and-Image-to-Video Generation through Multimodal Risk Detection
by: Ma, Ruize, et al.
Published: (2025)
by: Ma, Ruize, et al.
Published: (2025)
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
by: Li, Mingcheng, et al.
Published: (2025)
by: Li, Mingcheng, et al.
Published: (2025)
TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation
by: Park, NaHyeon, et al.
Published: (2024)
by: Park, NaHyeon, et al.
Published: (2024)
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
by: Zhao, Shihao, et al.
Published: (2024)
by: Zhao, Shihao, et al.
Published: (2024)
Flow of Truth: Proactive Temporal Forensics for Image-to-Video Generation
by: Chen, Yuzhuo, et al.
Published: (2026)
by: Chen, Yuzhuo, et al.
Published: (2026)
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
by: Chen, Yiyang, et al.
Published: (2022)
by: Chen, Yiyang, et al.
Published: (2022)
Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models
by: Yoon, Lauren Hyoseo, et al.
Published: (2025)
by: Yoon, Lauren Hyoseo, et al.
Published: (2025)
M3: High-fidelity Text-to-Image Generation via Multi-Modal, Multi-Agent and Multi-Round Visual Reasoning
by: Yang, Bangji, et al.
Published: (2026)
by: Yang, Bangji, et al.
Published: (2026)
MEVG: Multi-event Video Generation with Text-to-Video Models
by: Oh, Gyeongrok, et al.
Published: (2023)
by: Oh, Gyeongrok, et al.
Published: (2023)
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
by: Ma, Zehong, et al.
Published: (2025)
by: Ma, Zehong, et al.
Published: (2025)
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval
by: Li, Wenjun, et al.
Published: (2024)
by: Li, Wenjun, et al.
Published: (2024)
NSFW-Classifier Guided Prompt Sanitization for Safe Text-to-Image Generation
by: Xie, Yu, et al.
Published: (2025)
by: Xie, Yu, et al.
Published: (2025)
OSPO: Object-Centric Self-Improving Preference Optimization for Text-to-Image Generation
by: Oh, Yoonjin, et al.
Published: (2025)
by: Oh, Yoonjin, et al.
Published: (2025)
Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards
by: Lyu, Qiang, et al.
Published: (2025)
by: Lyu, Qiang, et al.
Published: (2025)
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
by: Franchi, Gianni, et al.
Published: (2024)
by: Franchi, Gianni, et al.
Published: (2024)
Generative Recall, Dense Reranking: Learning Multi-View Semantic IDs for Efficient Text-to-Video Retrieval
by: Zhao, Zecheng, et al.
Published: (2026)
by: Zhao, Zecheng, et al.
Published: (2026)
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
by: Huang, Kaiyi, et al.
Published: (2024)
by: Huang, Kaiyi, et al.
Published: (2024)
Conditional Text-to-Image Generation with Reference Guidance
by: Kim, Taewook, et al.
Published: (2024)
by: Kim, Taewook, et al.
Published: (2024)
RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection
by: Lee, Junhee, et al.
Published: (2025)
by: Lee, Junhee, et al.
Published: (2025)
Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
by: Wang, Yunnan, et al.
Published: (2024)
by: Wang, Yunnan, et al.
Published: (2024)
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation
by: Woo, Young Beom, et al.
Published: (2025)
by: Woo, Young Beom, et al.
Published: (2025)
DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection
by: Zaman, Sayeem Been, et al.
Published: (2025)
by: Zaman, Sayeem Been, et al.
Published: (2025)
Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets
by: Decatur, Dale, et al.
Published: (2025)
by: Decatur, Dale, et al.
Published: (2025)
Learning Multi-dimensional Human Preference for Text-to-Image Generation
by: Zhang, Sixian, et al.
Published: (2024)
by: Zhang, Sixian, et al.
Published: (2024)
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
by: Zhou, Dewei, et al.
Published: (2024)
by: Zhou, Dewei, et al.
Published: (2024)
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
by: Zeng, Guanning, et al.
Published: (2025)
by: Zeng, Guanning, et al.
Published: (2025)
MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation
by: Oshima, Yuta, et al.
Published: (2025)
by: Oshima, Yuta, et al.
Published: (2025)
Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents
by: Song, Yurun, et al.
Published: (2026)
by: Song, Yurun, et al.
Published: (2026)
GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning
by: Jiang, Kaixun, et al.
Published: (2026)
by: Jiang, Kaixun, et al.
Published: (2026)
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
by: Lee, Seung Hyun, et al.
Published: (2024)
by: Lee, Seung Hyun, et al.
Published: (2024)
Similar Items
-
Beyond Aesthetics: Cultural Competence in Text-to-Image Models
by: Kannen, Nithish, et al.
Published: (2024) -
Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits
by: Kalibhat, Neha, et al.
Published: (2026) -
Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing
by: Ma, Shichao, et al.
Published: (2025) -
Alchemist: Turning Public Text-to-Image Data into Generative Gold
by: Startsev, Valerii, et al.
Published: (2025) -
Learning Complex Non-Rigid Image Edits from Multimodal Conditioning
by: Warner, Nikolai, et al.
Published: (2024)