:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rahman, Zillur, Sheng, Alex, Meo, Cristian
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.01509
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
by: Wu, Shang, et al.
Published: (2026)

Video-T1: Test-Time Scaling for Video Generation
by: Liu, Fangfu, et al.
Published: (2025)

VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
by: Paul, Dhiman, et al.
Published: (2024)

Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement
by: Jeong, Suchae, et al.
Published: (2025)

Multi-Scale Temporal Difference Transformer for Video-Text Retrieval
by: Wang, Ni, et al.
Published: (2024)

Dynamic Prompt Optimizing for Text-to-Image Generation
by: Mo, Wenyi, et al.
Published: (2024)

Scaling Image and Video Generation via Test-Time Evolutionary Search
by: He, Haoran, et al.
Published: (2025)

MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline
by: Han, Donghoon, et al.
Published: (2024)

TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis
by: Rahman, Kazi Mahathir, et al.
Published: (2025)

GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
by: Ye, Wen, et al.
Published: (2025)

TimeRefine: Temporal Grounding with Time Refining Video LLM
by: Wang, Xizi, et al.
Published: (2024)

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
by: Kim, Subin, et al.
Published: (2025)

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement
by: Lee, Daeun, et al.
Published: (2024)

Minority-Focused Text-to-Image Generation via Prompt Optimization
by: Um, Soobin, et al.
Published: (2024)

Long-Text-to-Image Generation via Compositional Prompt Decomposition
by: Huang, Jen-Yuan, et al.
Published: (2026)

PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding
by: Erregue, Iñaki, et al.
Published: (2026)

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
by: Xue, Qiyao, et al.
Published: (2024)

Scale Up Composed Image Retrieval Learning via Modification Text Generation
by: Zhou, Yinan, et al.
Published: (2025)

Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval
by: Yang, Junkai, et al.
Published: (2026)

VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval
by: Tzachor, Issar, et al.
Published: (2026)

Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
by: Park, Joonhyung, et al.
Published: (2025)

GAIS: Frame-Level Gated Audio-Visual Integration with Semantic Variance-Scaled Perturbation for Text-Video Retrieval
by: Yang, Bowen, et al.
Published: (2025)

VC4VG: Optimizing Video Captions for Text-to-Video Generation
by: Du, Yang, et al.
Published: (2025)

Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval
by: Cho, CH, et al.
Published: (2025)

Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation
by: Xu, Hang, et al.
Published: (2025)

Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
by: Yang, Xiaomeng, et al.
Published: (2025)

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024)

Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
by: Zeng, Gangyan, et al.
Published: (2024)

Video-As-Prompt: Unified Semantic Control for Video Generation
by: Bian, Yuxuan, et al.
Published: (2025)

Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement
by: Kim, Hyeonjin, et al.
Published: (2024)

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
by: Wang, Ziyang, et al.
Published: (2025)

Progressive Image Restoration via Text-Conditioned Video Generation
by: Kang, Peng, et al.
Published: (2025)

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
by: Yesiltepe, Hidir, et al.
Published: (2026)

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction
by: Ji, Longbin, et al.
Published: (2026)

DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation
by: Nithisopa, Naphat, et al.
Published: (2025)

Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis
by: Yang, Xinrui, et al.
Published: (2024)

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
by: Wu, Mingrui, et al.
Published: (2025)

TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
by: Qu, Leigang, et al.
Published: (2025)

RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
by: Gao, Bingjie, et al.
Published: (2025)

Text-to-Image Diffusion Models Cannot Count, and Prompt Refinement Cannot Help
by: Guo, Xuyang, et al.
Published: (2025)