Saved in:
| Main Authors: | Liang, Zhiyang, Wan, Ziyu, Liu, Hongyu, Chen, Dong, Shen, Qiu, Zhu, Hao, Chen, Dongdong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.26866 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Detail++: Training-Free Detail Enhancer for Text-to-Image Diffusion Models
by: Chen, Lifeng, et al.
Published: (2025)
by: Chen, Lifeng, et al.
Published: (2025)
Instant Preference Alignment for Text-to-Image Diffusion Models
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
ICAS: Detecting Training Data from Autoregressive Image Generative Models
by: Yu, Hongyao, et al.
Published: (2025)
by: Yu, Hongyao, et al.
Published: (2025)
DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling
by: Li, Boheng, et al.
Published: (2025)
by: Li, Boheng, et al.
Published: (2025)
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)
by: Qiu, Longtian, et al.
Published: (2024)
ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models
by: Zhou, Qin, et al.
Published: (2025)
by: Zhou, Qin, et al.
Published: (2025)
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
by: Zhou, Shijie, et al.
Published: (2025)
by: Zhou, Shijie, et al.
Published: (2025)
CIDER: A Causal Cure for Brand-Obsessed Text-to-Image Models
by: Shen, Fangjian, et al.
Published: (2025)
by: Shen, Fangjian, et al.
Published: (2025)
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
by: Chen, Dong, et al.
Published: (2026)
by: Chen, Dong, et al.
Published: (2026)
Spatially Covariant Image Registration with Text Prompts
by: Chen, Xiang, et al.
Published: (2023)
by: Chen, Xiang, et al.
Published: (2023)
Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)
by: Zhang, Huixuan, et al.
Published: (2025)
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
by: Wu, Ruihai, et al.
Published: (2025)
by: Wu, Ruihai, et al.
Published: (2025)
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
by: Zhou, Donghao, et al.
Published: (2024)
by: Zhou, Donghao, et al.
Published: (2024)
Uncovering the Text Embedding in Text-to-Image Diffusion Models
by: Yu, Hu, et al.
Published: (2024)
by: Yu, Hu, et al.
Published: (2024)
Holistic Evaluation for Interleaved Text-and-Image Generation
by: Liu, Minqian, et al.
Published: (2024)
by: Liu, Minqian, et al.
Published: (2024)
Image Captions are Natural Prompts for Text-to-Image Models
by: Lei, Shiye, et al.
Published: (2023)
by: Lei, Shiye, et al.
Published: (2023)
RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment
by: Jiang, Liyao, et al.
Published: (2026)
by: Jiang, Liyao, et al.
Published: (2026)
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
by: Jiao, Qirui, et al.
Published: (2025)
by: Jiao, Qirui, et al.
Published: (2025)
Efficient Text-driven Motion Generation via Latent Consistency Training
by: Hu, Mengxian, et al.
Published: (2024)
by: Hu, Mengxian, et al.
Published: (2024)
SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI
by: Liu, Zhiyang, et al.
Published: (2025)
by: Liu, Zhiyang, et al.
Published: (2025)
Unified Text-Image Generation with Weakness-Targeted Post-Training
by: Chen, Jiahui, et al.
Published: (2026)
by: Chen, Jiahui, et al.
Published: (2026)
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
by: Qiu, Lingteng, et al.
Published: (2025)
by: Qiu, Lingteng, et al.
Published: (2025)
PromptLA: Towards Integrity Verification of Black-box Text-to-Image Diffusion Models
by: Zhang, Zhuomeng, et al.
Published: (2024)
by: Zhang, Zhuomeng, et al.
Published: (2024)
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
by: Zhan, Yufei, et al.
Published: (2023)
by: Zhan, Yufei, et al.
Published: (2023)
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
by: Zhao, Yu, et al.
Published: (2024)
by: Zhao, Yu, et al.
Published: (2024)
SteerDiff: Steering towards Safe Text-to-Image Diffusion Models
by: Zhang, Hongxiang, et al.
Published: (2024)
by: Zhang, Hongxiang, et al.
Published: (2024)
HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation
by: Chen, Cong, et al.
Published: (2025)
by: Chen, Cong, et al.
Published: (2025)
ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts
by: Zhang, Yaping, et al.
Published: (2026)
by: Zhang, Yaping, et al.
Published: (2026)
FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models
by: Luo, Hanjun, et al.
Published: (2024)
by: Luo, Hanjun, et al.
Published: (2024)
Agentic Retoucher for Text-To-Image Generation
by: Shen, Shaocheng, et al.
Published: (2026)
by: Shen, Shaocheng, et al.
Published: (2026)
STARFlow: Spatial Temporal Feature Re-embedding with Attentive Learning for Real-world Scene Flow
by: Lu, Zhiyang, et al.
Published: (2024)
by: Lu, Zhiyang, et al.
Published: (2024)
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
by: Wang, Yufei, et al.
Published: (2025)
by: Wang, Yufei, et al.
Published: (2025)
MINOS: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
by: Zhang, Junzhe, et al.
Published: (2025)
by: Zhang, Junzhe, et al.
Published: (2025)
DiffRIS: Enhancing Referring Remote Sensing Image Segmentation with Pre-trained Text-to-Image Diffusion Models
by: Dong, Zhe, et al.
Published: (2025)
by: Dong, Zhe, et al.
Published: (2025)
A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation
by: Chai, Shurong, et al.
Published: (2025)
by: Chai, Shurong, et al.
Published: (2025)
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis
by: Chen, Muxi, et al.
Published: (2024)
by: Chen, Muxi, et al.
Published: (2024)
Video-to-Text Pedestrian Monitoring (VTPM): Leveraging Computer Vision and Large Language Models for Privacy-Preserve Pedestrian Activity Monitoring at Intersections
by: Abdelrahman, Ahmed S., et al.
Published: (2024)
by: Abdelrahman, Ahmed S., et al.
Published: (2024)
When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators
by: Adamkiewicz, Krzysztof, et al.
Published: (2026)
by: Adamkiewicz, Krzysztof, et al.
Published: (2026)
Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery
by: Chen, Hao, et al.
Published: (2026)
by: Chen, Hao, et al.
Published: (2026)
Regeneration Based Training-free Attribution of Fake Images Generated by Text-to-Image Generative Models
by: Li, Meiling, et al.
Published: (2024)
by: Li, Meiling, et al.
Published: (2024)
Similar Items
-
Detail++: Training-Free Detail Enhancer for Text-to-Image Diffusion Models
by: Chen, Lifeng, et al.
Published: (2025) -
Instant Preference Alignment for Text-to-Image Diffusion Models
by: Li, Yang, et al.
Published: (2025) -
ICAS: Detecting Training Data from Autoregressive Image Generative Models
by: Yu, Hongyao, et al.
Published: (2025) -
DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling
by: Li, Boheng, et al.
Published: (2025) -
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)