:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Zhiyang, Wan, Ziyu, Liu, Hongyu, Chen, Dong, Shen, Qiu, Zhu, Hao, Chen, Dongdong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.26866
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Detail++: Training-Free Detail Enhancer for Text-to-Image Diffusion Models
by: Chen, Lifeng, et al.
Published: (2025)

Instant Preference Alignment for Text-to-Image Diffusion Models
by: Li, Yang, et al.
Published: (2025)

ICAS: Detecting Training Data from Autoregressive Image Generative Models
by: Yu, Hongyao, et al.
Published: (2025)

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling
by: Li, Boheng, et al.
Published: (2025)

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
by: Qiu, Longtian, et al.
Published: (2024)

ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models
by: Zhou, Qin, et al.
Published: (2025)

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
by: Zhou, Shijie, et al.
Published: (2025)

CIDER: A Causal Cure for Brand-Obsessed Text-to-Image Models
by: Shen, Fangjian, et al.
Published: (2025)

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
by: Chen, Dong, et al.
Published: (2026)

Spatially Covariant Image Registration with Text Prompts
by: Chen, Xiang, et al.
Published: (2023)

Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
by: Zhang, Huixuan, et al.
Published: (2025)

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
by: Wu, Ruihai, et al.
Published: (2025)

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
by: Zhou, Donghao, et al.
Published: (2024)

Uncovering the Text Embedding in Text-to-Image Diffusion Models
by: Yu, Hu, et al.
Published: (2024)

Holistic Evaluation for Interleaved Text-and-Image Generation
by: Liu, Minqian, et al.
Published: (2024)

Image Captions are Natural Prompts for Text-to-Image Models
by: Lei, Shiye, et al.
Published: (2023)

RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment
by: Jiang, Liyao, et al.
Published: (2026)

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
by: Jiao, Qirui, et al.
Published: (2025)

Efficient Text-driven Motion Generation via Latent Consistency Training
by: Hu, Mengxian, et al.
Published: (2024)

SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI
by: Liu, Zhiyang, et al.
Published: (2025)

Unified Text-Image Generation with Weakness-Targeted Post-Training
by: Chen, Jiahui, et al.
Published: (2026)

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
by: Qiu, Lingteng, et al.
Published: (2025)

PromptLA: Towards Integrity Verification of Black-box Text-to-Image Diffusion Models
by: Zhang, Zhuomeng, et al.
Published: (2024)

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
by: Zhan, Yufei, et al.
Published: (2023)

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
by: Zhao, Yu, et al.
Published: (2024)

SteerDiff: Steering towards Safe Text-to-Image Diffusion Models
by: Zhang, Hongxiang, et al.
Published: (2024)

HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation
by: Chen, Cong, et al.
Published: (2025)

ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts
by: Zhang, Yaping, et al.
Published: (2026)

FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models
by: Luo, Hanjun, et al.
Published: (2024)

Agentic Retoucher for Text-To-Image Generation
by: Shen, Shaocheng, et al.
Published: (2026)

STARFlow: Spatial Temporal Feature Re-embedding with Attentive Learning for Real-world Scene Flow
by: Lu, Zhiyang, et al.
Published: (2024)

Training-Free Text-Guided Image Editing with Visual Autoregressive Model
by: Wang, Yufei, et al.
Published: (2025)

MINOS: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
by: Zhang, Junzhe, et al.
Published: (2025)

DiffRIS: Enhancing Referring Remote Sensing Image Segmentation with Pre-trained Text-to-Image Diffusion Models
by: Dong, Zhe, et al.
Published: (2025)

A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation
by: Chai, Shurong, et al.
Published: (2025)

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis
by: Chen, Muxi, et al.
Published: (2024)

Video-to-Text Pedestrian Monitoring (VTPM): Leveraging Computer Vision and Large Language Models for Privacy-Preserve Pedestrian Activity Monitoring at Intersections
by: Abdelrahman, Ahmed S., et al.
Published: (2024)

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators
by: Adamkiewicz, Krzysztof, et al.
Published: (2026)

Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery
by: Chen, Hao, et al.
Published: (2026)

Regeneration Based Training-free Attribution of Fake Images Generated by Text-to-Image Generative Models
by: Li, Meiling, et al.
Published: (2024)