Saved in:
| Main Authors: | Son, Moo Hyun, Oh, Jintaek, Mun, Sun Bin, Roh, Jaechul, Choi, Sehyun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.04201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Image Clustering Conditioned on Text Criteria
by: Kwon, Sehyun, et al.
Published: (2023)
by: Kwon, Sehyun, et al.
Published: (2023)
Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
by: Park, Sangha, et al.
Published: (2025)
by: Park, Sangha, et al.
Published: (2025)
FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
by: Roh, Jaechul, et al.
Published: (2024)
by: Roh, Jaechul, et al.
Published: (2024)
Latent Expression Generation for Referring Image Segmentation and Grounding
by: Yu, Seonghoon, et al.
Published: (2025)
by: Yu, Seonghoon, et al.
Published: (2025)
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
by: Yang, Deshun, et al.
Published: (2024)
by: Yang, Deshun, et al.
Published: (2024)
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
by: Lin, Wang, et al.
Published: (2026)
by: Lin, Wang, et al.
Published: (2026)
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
by: Lim, Youngsun, et al.
Published: (2024)
by: Lim, Youngsun, et al.
Published: (2024)
Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations
by: Kwon, Dahee, et al.
Published: (2025)
by: Kwon, Dahee, et al.
Published: (2025)
Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation
by: Süleyman, Ahmad, et al.
Published: (2025)
by: Süleyman, Ahmad, et al.
Published: (2025)
Identifiable Token Correspondence for World Models
by: Kim, Youngin, et al.
Published: (2026)
by: Kim, Youngin, et al.
Published: (2026)
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation
by: Wu, Yinwei, et al.
Published: (2024)
by: Wu, Yinwei, et al.
Published: (2024)
DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration
by: Wan, Xingchen, et al.
Published: (2025)
by: Wan, Xingchen, et al.
Published: (2025)
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
by: Niu, Yuwei, et al.
Published: (2025)
by: Niu, Yuwei, et al.
Published: (2025)
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
by: Lee, Mingyu, et al.
Published: (2024)
by: Lee, Mingyu, et al.
Published: (2024)
OpenSDI: Spotting Diffusion-Generated Images in the Open World
by: Wang, Yabin, et al.
Published: (2025)
by: Wang, Yabin, et al.
Published: (2025)
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
by: Wang, Dianyi, et al.
Published: (2026)
by: Wang, Dianyi, et al.
Published: (2026)
Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
by: Kim, Kangyeol, et al.
Published: (2024)
by: Kim, Kangyeol, et al.
Published: (2024)
Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation
by: Seo, Hoigi, et al.
Published: (2025)
by: Seo, Hoigi, et al.
Published: (2025)
HARIVO: Harnessing Text-to-Image Models for Video Generation
by: Kwon, Mingi, et al.
Published: (2024)
by: Kwon, Mingi, et al.
Published: (2024)
Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
by: Choi, Seokhun, et al.
Published: (2024)
by: Choi, Seokhun, et al.
Published: (2024)
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
by: Hyeon-Woo, Nam, et al.
Published: (2024)
by: Hyeon-Woo, Nam, et al.
Published: (2024)
Dynamic Orchestration of Multi-Agent System for Real-World Multi-Image Agricultural VQA
by: Ke, Yan, et al.
Published: (2025)
by: Ke, Yan, et al.
Published: (2025)
OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios
by: Gao, Hong, et al.
Published: (2025)
by: Gao, Hong, et al.
Published: (2025)
MambaOutRS: A Hybrid CNN-Fourier Architecture for Remote Sensing Image Classification
by: Cheon, Minjong, et al.
Published: (2025)
by: Cheon, Minjong, et al.
Published: (2025)
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement
by: Jeong, Suchae, et al.
Published: (2025)
by: Jeong, Suchae, et al.
Published: (2025)
Semantic Guidance Tuning for Text-To-Image Diffusion Models
by: Kang, Hyun, et al.
Published: (2023)
by: Kang, Hyun, et al.
Published: (2023)
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
by: Gou, Boyu, et al.
Published: (2024)
by: Gou, Boyu, et al.
Published: (2024)
FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models
by: Lim, Youngsun, et al.
Published: (2026)
by: Lim, Youngsun, et al.
Published: (2026)
MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning
by: Yang, Pu, et al.
Published: (2025)
by: Yang, Pu, et al.
Published: (2025)
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
by: Zhao, Yu, et al.
Published: (2024)
by: Zhao, Yu, et al.
Published: (2024)
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
by: Yang, ChangHee, et al.
Published: (2025)
by: Yang, ChangHee, et al.
Published: (2025)
Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation
by: Son, Moo Hyun, et al.
Published: (2025)
by: Son, Moo Hyun, et al.
Published: (2025)
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework
by: Nguyen, Cong Huy, et al.
Published: (2026)
by: Nguyen, Cong Huy, et al.
Published: (2026)
DragText: Rethinking Text Embedding in Point-based Image Editing
by: Choi, Gayoon, et al.
Published: (2024)
by: Choi, Gayoon, et al.
Published: (2024)
Multi-Agent Image Restoration
by: Jiang, Xu, et al.
Published: (2025)
by: Jiang, Xu, et al.
Published: (2025)
Knowledge-based learning in Text-RAG and Image-RAG
by: Shim, Alexander, et al.
Published: (2026)
by: Shim, Alexander, et al.
Published: (2026)
WorldGen: From Text to Traversable and Interactive 3D Worlds
by: Wang, Dilin, et al.
Published: (2025)
by: Wang, Dilin, et al.
Published: (2025)
Similar Items
-
Image Clustering Conditioned on Text Criteria
by: Kwon, Sehyun, et al.
Published: (2023) -
Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
by: Park, Sangha, et al.
Published: (2025) -
FameBias: Embedding Manipulation Bias Attack in Text-to-Image Models
by: Roh, Jaechul, et al.
Published: (2024) -
Latent Expression Generation for Referring Image Segmentation and Grounding
by: Yu, Seonghoon, et al.
Published: (2025) -
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
by: Yang, Deshun, et al.
Published: (2024)