Saved in:
| Main Authors: | Rahman, Zillur, Sheng, Alex, Meo, Cristian |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.01509 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
by: Wu, Shang, et al.
Published: (2026)
by: Wu, Shang, et al.
Published: (2026)
Video-T1: Test-Time Scaling for Video Generation
by: Liu, Fangfu, et al.
Published: (2025)
by: Liu, Fangfu, et al.
Published: (2025)
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
by: Paul, Dhiman, et al.
Published: (2024)
by: Paul, Dhiman, et al.
Published: (2024)
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement
by: Jeong, Suchae, et al.
Published: (2025)
by: Jeong, Suchae, et al.
Published: (2025)
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval
by: Wang, Ni, et al.
Published: (2024)
by: Wang, Ni, et al.
Published: (2024)
Dynamic Prompt Optimizing for Text-to-Image Generation
by: Mo, Wenyi, et al.
Published: (2024)
by: Mo, Wenyi, et al.
Published: (2024)
Scaling Image and Video Generation via Test-Time Evolutionary Search
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline
by: Han, Donghoon, et al.
Published: (2024)
by: Han, Donghoon, et al.
Published: (2024)
TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis
by: Rahman, Kazi Mahathir, et al.
Published: (2025)
by: Rahman, Kazi Mahathir, et al.
Published: (2025)
GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
by: Ye, Wen, et al.
Published: (2025)
by: Ye, Wen, et al.
Published: (2025)
TimeRefine: Temporal Grounding with Time Refining Video LLM
by: Wang, Xizi, et al.
Published: (2024)
by: Wang, Xizi, et al.
Published: (2024)
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation
by: Kim, Subin, et al.
Published: (2025)
by: Kim, Subin, et al.
Published: (2025)
Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement
by: Lee, Daeun, et al.
Published: (2024)
by: Lee, Daeun, et al.
Published: (2024)
Minority-Focused Text-to-Image Generation via Prompt Optimization
by: Um, Soobin, et al.
Published: (2024)
by: Um, Soobin, et al.
Published: (2024)
Long-Text-to-Image Generation via Compositional Prompt Decomposition
by: Huang, Jen-Yuan, et al.
Published: (2026)
by: Huang, Jen-Yuan, et al.
Published: (2026)
PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding
by: Erregue, Iñaki, et al.
Published: (2026)
by: Erregue, Iñaki, et al.
Published: (2026)
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
by: Xue, Qiyao, et al.
Published: (2024)
by: Xue, Qiyao, et al.
Published: (2024)
Scale Up Composed Image Retrieval Learning via Modification Text Generation
by: Zhou, Yinan, et al.
Published: (2025)
by: Zhou, Yinan, et al.
Published: (2025)
Knowledge-Refined Dual Context-Aware Network for Partially Relevant Video Retrieval
by: Yang, Junkai, et al.
Published: (2026)
by: Yang, Junkai, et al.
Published: (2026)
VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval
by: Tzachor, Issar, et al.
Published: (2026)
by: Tzachor, Issar, et al.
Published: (2026)
Progress by Pieces: Test-Time Scaling for Autoregressive Image Generation
by: Park, Joonhyung, et al.
Published: (2025)
by: Park, Joonhyung, et al.
Published: (2025)
GAIS: Frame-Level Gated Audio-Visual Integration with Semantic Variance-Scaled Perturbation for Text-Video Retrieval
by: Yang, Bowen, et al.
Published: (2025)
by: Yang, Bowen, et al.
Published: (2025)
VC4VG: Optimizing Video Captions for Text-to-Video Generation
by: Du, Yang, et al.
Published: (2025)
by: Du, Yang, et al.
Published: (2025)
Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval
by: Cho, CH, et al.
Published: (2025)
by: Cho, CH, et al.
Published: (2025)
Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation
by: Xu, Hang, et al.
Published: (2025)
by: Xu, Hang, et al.
Published: (2025)
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
by: Yang, Xiaomeng, et al.
Published: (2025)
by: Yang, Xiaomeng, et al.
Published: (2025)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024)
by: Menapace, Willi, et al.
Published: (2024)
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
by: Zeng, Gangyan, et al.
Published: (2024)
by: Zeng, Gangyan, et al.
Published: (2024)
Video-As-Prompt: Unified Semantic Control for Video Generation
by: Bian, Yuxuan, et al.
Published: (2025)
by: Bian, Yuxuan, et al.
Published: (2025)
Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement
by: Kim, Hyeonjin, et al.
Published: (2024)
by: Kim, Hyeonjin, et al.
Published: (2024)
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
by: Wang, Ziyang, et al.
Published: (2025)
by: Wang, Ziyang, et al.
Published: (2025)
Progressive Image Restoration via Text-Conditioned Video Generation
by: Kang, Peng, et al.
Published: (2025)
by: Kang, Peng, et al.
Published: (2025)
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
by: Yesiltepe, Hidir, et al.
Published: (2026)
by: Yesiltepe, Hidir, et al.
Published: (2026)
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction
by: Ji, Longbin, et al.
Published: (2026)
by: Ji, Longbin, et al.
Published: (2026)
DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation
by: Nithisopa, Naphat, et al.
Published: (2025)
by: Nithisopa, Naphat, et al.
Published: (2025)
Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis
by: Yang, Xinrui, et al.
Published: (2024)
by: Yang, Xinrui, et al.
Published: (2024)
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
by: Wu, Mingrui, et al.
Published: (2025)
by: Wu, Mingrui, et al.
Published: (2025)
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
by: Qu, Leigang, et al.
Published: (2025)
by: Qu, Leigang, et al.
Published: (2025)
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
by: Gao, Bingjie, et al.
Published: (2025)
by: Gao, Bingjie, et al.
Published: (2025)
Text-to-Image Diffusion Models Cannot Count, and Prompt Refinement Cannot Help
by: Guo, Xuyang, et al.
Published: (2025)
by: Guo, Xuyang, et al.
Published: (2025)
Similar Items
-
PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
by: Wu, Shang, et al.
Published: (2026) -
Video-T1: Test-Time Scaling for Video Generation
by: Liu, Fangfu, et al.
Published: (2025) -
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
by: Paul, Dhiman, et al.
Published: (2024) -
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement
by: Jeong, Suchae, et al.
Published: (2025) -
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval
by: Wang, Ni, et al.
Published: (2024)