:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yili, Yu, Jing, Gai, Keke, Liu, Bang, Xiong, Gang, Wu, Qi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.11432
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

IIU: Independent Inference Units for Knowledge-based Visual Question Answering
by: Li, Yili, et al.
Published: (2024)

Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
by: Tang, Yuanmin, et al.
Published: (2024)

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
by: Tang, Yuanmin, et al.
Published: (2025)

T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval
by: Li, Yili, et al.
Published: (2025)

ALIEN: Analytic Latent Watermarking for Controllable Generation
by: Lei, Liangqi, et al.
Published: (2026)

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning
by: Qu, Xiangyan, et al.
Published: (2024)

Adversarial Video Promotion Against Text-to-Video Retrieval
by: Tian, Qiwei, et al.
Published: (2025)

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
by: Cao, Meng, et al.
Published: (2024)

TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
by: Shen, Leqi, et al.
Published: (2024)

Denoise-then-Retrieve: Text-Conditioned Video Denoising for Video Moment Retrieval
by: Liu, Weijia, et al.
Published: (2025)

EA-VTR: Event-Aware Video-Text Retrieval
by: Ma, Zongyang, et al.
Published: (2024)

Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
by: Lan, Bangxiang, et al.
Published: (2025)

Co-speech Gesture Video Generation via Motion-Based Graph Retrieval
by: Song, Yafei, et al.
Published: (2025)

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
by: Gao, Bingjie, et al.
Published: (2025)

EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
by: Tian, Linrui, et al.
Published: (2024)

Text-Animator: Controllable Visual Text Video Generation
by: Liu, Lin, et al.
Published: (2024)

CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
by: Wang, Zhao, et al.
Published: (2024)

SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval
by: Zhao, Ruixiang, et al.
Published: (2026)

A Survey of Interactive Generative Video
by: Yu, Jiwen, et al.
Published: (2025)

UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
by: Tian, Kaibin, et al.
Published: (2024)

Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark
by: Yang, Shuyu, et al.
Published: (2025)

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation
by: Tian, Linrui, et al.
Published: (2025)

TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
by: Zhang, Bingqing, et al.
Published: (2024)

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
by: Liu, Dongyang, et al.
Published: (2025)

TVPR: Text-to-Video Person Retrieval and a New Benchmark
by: Zhang, Xu, et al.
Published: (2023)

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
by: Wang, Jiamian, et al.
Published: (2024)

OmniRetriever: Any-to-Any Audio-Video-Text Retrieval via Fusion-as-Teacher Distillation
by: Liu, Yunze, et al.
Published: (2026)

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)

Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
by: Zhang, Deyu, et al.
Published: (2025)

Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation
by: Tu, Shuyuan, et al.
Published: (2026)

VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion
by: Yang, Lehan, et al.
Published: (2025)

LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
by: Wang, Jiarui, et al.
Published: (2025)

T-SVG: Text-Driven Stereoscopic Video Generation
by: Jin, Qiao, et al.
Published: (2024)

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
by: Yang, Zhiwei, et al.
Published: (2024)

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval
by: Liu, Haowei, et al.
Published: (2024)

FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation
by: Jing, Liqiang, et al.
Published: (2025)

Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
by: Zhao, Haoyu, et al.
Published: (2025)

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
by: Jin, Xiaojie, et al.
Published: (2023)

VidText: Towards Comprehensive Evaluation for Video Text Understanding
by: Yang, Zhoufaran, et al.
Published: (2025)