Saved in:
| Main Authors: | Li, Yili, Yu, Jing, Gai, Keke, Liu, Bang, Xiong, Gang, Wu, Qi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.11432 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
IIU: Independent Inference Units for Knowledge-based Visual Question Answering
by: Li, Yili, et al.
Published: (2024)
by: Li, Yili, et al.
Published: (2024)
Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
by: Tang, Yuanmin, et al.
Published: (2024)
by: Tang, Yuanmin, et al.
Published: (2024)
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
by: Tang, Yuanmin, et al.
Published: (2025)
by: Tang, Yuanmin, et al.
Published: (2025)
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval
by: Li, Yili, et al.
Published: (2025)
by: Li, Yili, et al.
Published: (2025)
ALIEN: Analytic Latent Watermarking for Controllable Generation
by: Lei, Liangqi, et al.
Published: (2026)
by: Lei, Liangqi, et al.
Published: (2026)
Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning
by: Qu, Xiangyan, et al.
Published: (2024)
by: Qu, Xiangyan, et al.
Published: (2024)
Adversarial Video Promotion Against Text-to-Video Retrieval
by: Tian, Qiwei, et al.
Published: (2025)
by: Tian, Qiwei, et al.
Published: (2025)
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
by: Cao, Meng, et al.
Published: (2024)
by: Cao, Meng, et al.
Published: (2024)
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
by: Shen, Leqi, et al.
Published: (2024)
by: Shen, Leqi, et al.
Published: (2024)
Denoise-then-Retrieve: Text-Conditioned Video Denoising for Video Moment Retrieval
by: Liu, Weijia, et al.
Published: (2025)
by: Liu, Weijia, et al.
Published: (2025)
EA-VTR: Event-Aware Video-Text Retrieval
by: Ma, Zongyang, et al.
Published: (2024)
by: Ma, Zongyang, et al.
Published: (2024)
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
by: Lan, Bangxiang, et al.
Published: (2025)
by: Lan, Bangxiang, et al.
Published: (2025)
Co-speech Gesture Video Generation via Motion-Based Graph Retrieval
by: Song, Yafei, et al.
Published: (2025)
by: Song, Yafei, et al.
Published: (2025)
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
by: Gao, Bingjie, et al.
Published: (2025)
by: Gao, Bingjie, et al.
Published: (2025)
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
by: Tian, Linrui, et al.
Published: (2024)
by: Tian, Linrui, et al.
Published: (2024)
Text-Animator: Controllable Visual Text Video Generation
by: Liu, Lin, et al.
Published: (2024)
by: Liu, Lin, et al.
Published: (2024)
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval
by: Zhao, Ruixiang, et al.
Published: (2026)
by: Zhao, Ruixiang, et al.
Published: (2026)
A Survey of Interactive Generative Video
by: Yu, Jiwen, et al.
Published: (2025)
by: Yu, Jiwen, et al.
Published: (2025)
UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)
by: Wei, Cong, et al.
Published: (2025)
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
by: Tian, Kaibin, et al.
Published: (2024)
by: Tian, Kaibin, et al.
Published: (2024)
Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark
by: Yang, Shuyu, et al.
Published: (2025)
by: Yang, Shuyu, et al.
Published: (2025)
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation
by: Tian, Linrui, et al.
Published: (2025)
by: Tian, Linrui, et al.
Published: (2025)
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm
by: Zhang, Bingqing, et al.
Published: (2024)
by: Zhang, Bingqing, et al.
Published: (2024)
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
by: Liu, Dongyang, et al.
Published: (2025)
by: Liu, Dongyang, et al.
Published: (2025)
TVPR: Text-to-Video Person Retrieval and a New Benchmark
by: Zhang, Xu, et al.
Published: (2023)
by: Zhang, Xu, et al.
Published: (2023)
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
by: Wang, Jiamian, et al.
Published: (2024)
by: Wang, Jiamian, et al.
Published: (2024)
OmniRetriever: Any-to-Any Audio-Video-Text Retrieval via Fusion-as-Teacher Distillation
by: Liu, Yunze, et al.
Published: (2026)
by: Liu, Yunze, et al.
Published: (2026)
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)
by: Huang, Haoyang, et al.
Published: (2025)
Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
by: Zhang, Deyu, et al.
Published: (2025)
by: Zhang, Deyu, et al.
Published: (2025)
Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation
by: Tu, Shuyuan, et al.
Published: (2026)
by: Tu, Shuyuan, et al.
Published: (2026)
VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion
by: Yang, Lehan, et al.
Published: (2025)
by: Yang, Lehan, et al.
Published: (2025)
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
by: Wang, Jiarui, et al.
Published: (2025)
by: Wang, Jiarui, et al.
Published: (2025)
T-SVG: Text-Driven Stereoscopic Video Generation
by: Jin, Qiao, et al.
Published: (2024)
by: Jin, Qiao, et al.
Published: (2024)
Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
by: Yang, Zhiwei, et al.
Published: (2024)
by: Yang, Zhiwei, et al.
Published: (2024)
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval
by: Liu, Haowei, et al.
Published: (2024)
by: Liu, Haowei, et al.
Published: (2024)
FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation
by: Jing, Liqiang, et al.
Published: (2025)
by: Jing, Liqiang, et al.
Published: (2025)
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
by: Zhao, Haoyu, et al.
Published: (2025)
by: Zhao, Haoyu, et al.
Published: (2025)
MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
by: Jin, Xiaojie, et al.
Published: (2023)
by: Jin, Xiaojie, et al.
Published: (2023)
VidText: Towards Comprehensive Evaluation for Video Text Understanding
by: Yang, Zhoufaran, et al.
Published: (2025)
by: Yang, Zhoufaran, et al.
Published: (2025)
Similar Items
-
IIU: Independent Inference Units for Knowledge-based Visual Question Answering
by: Li, Yili, et al.
Published: (2024) -
Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
by: Tang, Yuanmin, et al.
Published: (2024) -
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
by: Tang, Yuanmin, et al.
Published: (2025) -
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval
by: Li, Yili, et al.
Published: (2025) -
ALIEN: Analytic Latent Watermarking for Controllable Generation
by: Lei, Liangqi, et al.
Published: (2026)