:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Son, Moo Hyun, Oh, Jintaek, Mun, Sun Bin, Roh, Jaechul, Choi, Sehyun
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.04201
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Image Clustering Conditioned on Text Criteria
by: Kwon, Sehyun, et al.
Published: (2023)

Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
by: Park, Sangha, et al.
Published: (2025)

FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
by: Roh, Jaechul, et al.
Published: (2024)

Latent Expression Generation for Referring Image Segmentation and Grounding
by: Yu, Seonghoon, et al.
Published: (2025)

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
by: Yang, Deshun, et al.
Published: (2024)

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
by: Choi, Jeongsoo, et al.
Published: (2025)

WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
by: Lin, Wang, et al.
Published: (2026)

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
by: Lim, Youngsun, et al.
Published: (2024)

Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations
by: Kwon, Dahee, et al.
Published: (2025)

Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation
by: Süleyman, Ahmad, et al.
Published: (2025)

Identifiable Token Correspondence for World Models
by: Kim, Youngin, et al.
Published: (2026)

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation
by: Wu, Yinwei, et al.
Published: (2024)

DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
by: Sun, Haoran, et al.
Published: (2025)

Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration
by: Wan, Xingchen, et al.
Published: (2025)

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
by: Niu, Yuwei, et al.
Published: (2025)

Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
by: Lee, Mingyu, et al.
Published: (2024)

OpenSDI: Spotting Diffusion-Generated Images in the Open World
by: Wang, Yabin, et al.
Published: (2025)

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
by: Wang, Dianyi, et al.
Published: (2026)

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
by: Kim, Kangyeol, et al.
Published: (2024)

Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation
by: Seo, Hoigi, et al.
Published: (2025)

HARIVO: Harnessing Text-to-Image Models for Video Generation
by: Kwon, Mingi, et al.
Published: (2024)

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
by: Choi, Seokhun, et al.
Published: (2024)

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)

VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
by: Hyeon-Woo, Nam, et al.
Published: (2024)

Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA
by: Ke, Yan, et al.
Published: (2025)

OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios
by: Gao, Hong, et al.
Published: (2025)

MambaOutRS: A Hybrid CNN-Fourier Architecture for Remote Sensing Image Classification
by: Cheon, Minjong, et al.
Published: (2025)

Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement
by: Jeong, Suchae, et al.
Published: (2025)

Semantic Guidance Tuning for Text-To-Image Diffusion Models
by: Kang, Hyun, et al.
Published: (2023)

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
by: Gou, Boyu, et al.
Published: (2024)

FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models
by: Lim, Youngsun, et al.
Published: (2026)

MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning
by: Yang, Pu, et al.
Published: (2025)

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
by: Zhao, Yu, et al.
Published: (2024)

PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
by: Yang, ChangHee, et al.
Published: (2025)

Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation
by: Son, Moo Hyun, et al.
Published: (2025)

Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework
by: Nguyen, Cong Huy, et al.
Published: (2026)

DragText: Rethinking Text Embedding in Point-based Image Editing
by: Choi, Gayoon, et al.
Published: (2024)

Multi-Agent Image Restoration
by: Jiang, Xu, et al.
Published: (2025)

Knowledge-based learning in Text-RAG and Image-RAG
by: Shim, Alexander, et al.
Published: (2026)

WorldGen: From Text to Traversable and Interactive 3D Worlds
by: Wang, Dilin, et al.
Published: (2025)