Saved in:
| Main Authors: | Xu, Yijia, Wang, Zihao, Cui, Jinshi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03448 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Direction-Aware Diagonal Autoregressive Image Generation
by: Xu, Yijia, et al.
Published: (2025)
by: Xu, Yijia, et al.
Published: (2025)
Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning
by: Wu, Zhiyu, et al.
Published: (2024)
by: Wu, Zhiyu, et al.
Published: (2024)
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
by: Deng, Yufan, et al.
Published: (2025)
by: Deng, Yufan, et al.
Published: (2025)
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
by: Deng, Yufan, et al.
Published: (2025)
by: Deng, Yufan, et al.
Published: (2025)
DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation
by: Hu, Zhenyu, et al.
Published: (2026)
by: Hu, Zhenyu, et al.
Published: (2026)
IdGlow: Dynamic Identity Modulation for Multi-Subject Generation
by: Cai, Honghao, et al.
Published: (2026)
by: Cai, Honghao, et al.
Published: (2026)
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
by: Narasimhaswamy, Supreeth, et al.
Published: (2024)
by: Narasimhaswamy, Supreeth, et al.
Published: (2024)
Cached Multi-Lora Composition for Multi-Concept Image Generation
by: Zou, Xiandong, et al.
Published: (2025)
by: Zou, Xiandong, et al.
Published: (2025)
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation
by: Woo, Young Beom, et al.
Published: (2025)
by: Woo, Young Beom, et al.
Published: (2025)
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
by: Basioti, Kalliopi, et al.
Published: (2025)
by: Basioti, Kalliopi, et al.
Published: (2025)
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
by: Kim, Jihyo, et al.
Published: (2024)
by: Kim, Jihyo, et al.
Published: (2024)
LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation
by: Deng, Fan, et al.
Published: (2024)
by: Deng, Fan, et al.
Published: (2024)
VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation
by: Luo, Yan, et al.
Published: (2026)
by: Luo, Yan, et al.
Published: (2026)
FocusDPO: Dynamic Preference Optimization for Multi-Subject Personalized Image Generation via Adaptive Focus
by: Jin, Qiaoqiao, et al.
Published: (2025)
by: Jin, Qiaoqiao, et al.
Published: (2025)
HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing
by: Zhu, Zihao, et al.
Published: (2024)
by: Zhu, Zihao, et al.
Published: (2024)
Single Image Iterative Subject-driven Generation and Editing
by: Shpitzer, Yair, et al.
Published: (2025)
by: Shpitzer, Yair, et al.
Published: (2025)
Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model
by: Zhang, Zhenxing, et al.
Published: (2025)
by: Zhang, Zhenxing, et al.
Published: (2025)
Hierarchical, Interpretable, Label-Free Concept Bottleneck Model
by: Xie, Haodong, et al.
Published: (2026)
by: Xie, Haodong, et al.
Published: (2026)
Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
by: Jang, Sangwon, et al.
Published: (2024)
by: Jang, Sangwon, et al.
Published: (2024)
Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
by: Xie, Dian, et al.
Published: (2026)
by: Xie, Dian, et al.
Published: (2026)
Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis
by: Chen, Weiming, et al.
Published: (2025)
by: Chen, Weiming, et al.
Published: (2025)
3D-Agent:Tri-Modal Multi-Agent Collaboration for Scalable 3D Object Annotation
by: Zhang, Jusheng, et al.
Published: (2026)
by: Zhang, Jusheng, et al.
Published: (2026)
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
by: Wang, Yuran, et al.
Published: (2025)
by: Wang, Yuran, et al.
Published: (2025)
BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance
by: Wang, Ling, et al.
Published: (2024)
by: Wang, Ling, et al.
Published: (2024)
DescriptorMedSAM: Language-Image Fusion with Multi-Aspect Text Guidance for Medical Image Segmentation
by: Zhang, Wenjie, et al.
Published: (2025)
by: Zhang, Wenjie, et al.
Published: (2025)
GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation
by: Tao, Xujing, et al.
Published: (2026)
by: Tao, Xujing, et al.
Published: (2026)
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
by: Dahary, Omer, et al.
Published: (2024)
by: Dahary, Omer, et al.
Published: (2024)
Conceptrol: Concept Control of Zero-shot Personalized Image Generation
by: He, Qiyuan, et al.
Published: (2025)
by: He, Qiyuan, et al.
Published: (2025)
High-fidelity Person-centric Subject-to-Image Synthesis
by: Wang, Yibin, et al.
Published: (2023)
by: Wang, Yibin, et al.
Published: (2023)
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
by: Azam, Basim, et al.
Published: (2025)
by: Azam, Basim, et al.
Published: (2025)
Flux Already Knows -- Activating Subject-Driven Image Generation without Training
by: Kang, Hao, et al.
Published: (2025)
by: Kang, Hao, et al.
Published: (2025)
DIRECT: Video Mashup Creation via Hierarchical Multi-Agent Planning and Intent-Guided Editing
by: Li, Ke, et al.
Published: (2026)
by: Li, Ke, et al.
Published: (2026)
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
by: Yeh, Chun-Hsiao, et al.
Published: (2024)
by: Yeh, Chun-Hsiao, et al.
Published: (2024)
Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading
by: Xu, Dunyuan, et al.
Published: (2024)
by: Xu, Dunyuan, et al.
Published: (2024)
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
by: Chen, Zisheng, et al.
Published: (2025)
by: Chen, Zisheng, et al.
Published: (2025)
RSGen: Enhancing Layout-Driven Remote Sensing Image Generation with Diverse Edge Guidance
by: Hou, Xianbao, et al.
Published: (2026)
by: Hou, Xianbao, et al.
Published: (2026)
Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior
by: Li, Xiang, et al.
Published: (2026)
by: Li, Xiang, et al.
Published: (2026)
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
by: Wang, Kuan-Chieh, et al.
Published: (2024)
by: Wang, Kuan-Chieh, et al.
Published: (2024)
BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
by: Zhu, Zihao, et al.
Published: (2026)
by: Zhu, Zihao, et al.
Published: (2026)
Holistic Evaluation for Interleaved Text-and-Image Generation
by: Liu, Minqian, et al.
Published: (2024)
by: Liu, Minqian, et al.
Published: (2024)
Similar Items
-
Direction-Aware Diagonal Autoregressive Image Generation
by: Xu, Yijia, et al.
Published: (2025) -
Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning
by: Wu, Zhiyu, et al.
Published: (2024) -
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
by: Deng, Yufan, et al.
Published: (2025) -
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
by: Deng, Yufan, et al.
Published: (2025) -
DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation
by: Hu, Zhenyu, et al.
Published: (2026)