Saved in:
| Main Authors: | Li, Gen, Liu, Peiyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.01513 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
by: Zeng, Nianbo, et al.
Published: (2025)
by: Zeng, Nianbo, et al.
Published: (2025)
DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation
by: Wang, Zun, et al.
Published: (2024)
by: Wang, Zun, et al.
Published: (2024)
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
by: Luo, Yongdong, et al.
Published: (2024)
by: Luo, Yongdong, et al.
Published: (2024)
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos
by: Ren, Xubin, et al.
Published: (2025)
by: Ren, Xubin, et al.
Published: (2025)
F$^3$Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
by: Liu, Zhaoyu, et al.
Published: (2025)
by: Liu, Zhaoyu, et al.
Published: (2025)
Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation
by: Sun, Xiatao, et al.
Published: (2025)
by: Sun, Xiatao, et al.
Published: (2025)
FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
by: Bai, Chengyu, et al.
Published: (2025)
by: Bai, Chengyu, et al.
Published: (2025)
Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
by: Yan, Peizheng, et al.
Published: (2026)
by: Yan, Peizheng, et al.
Published: (2026)
Towards Fine-Grained Human Motion Video Captioning
by: Song, Guorui, et al.
Published: (2025)
by: Song, Guorui, et al.
Published: (2025)
VideoRAG: Retrieval-Augmented Generation over Video Corpus
by: Jeong, Soyeong, et al.
Published: (2025)
by: Jeong, Soyeong, et al.
Published: (2025)
RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition
by: Sivaroopan, Nirhoshan, et al.
Published: (2025)
by: Sivaroopan, Nirhoshan, et al.
Published: (2025)
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
by: Shen, Xuan, et al.
Published: (2025)
by: Shen, Xuan, et al.
Published: (2025)
MV-RAG: Retrieval Augmented Multiview Diffusion
by: Dayani, Yosef, et al.
Published: (2025)
by: Dayani, Yosef, et al.
Published: (2025)
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
by: Chen, Houlun, et al.
Published: (2024)
by: Chen, Houlun, et al.
Published: (2024)
Fast Autoregressive Video Generation with Diagonal Decoding
by: Ye, Yang, et al.
Published: (2025)
by: Ye, Yang, et al.
Published: (2025)
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Towards Fine-Grained Video Question Answering
by: Dai, Wei, et al.
Published: (2025)
by: Dai, Wei, et al.
Published: (2025)
FunQA: Towards Surprising Video Comprehension
by: Xie, Binzhu, et al.
Published: (2023)
by: Xie, Binzhu, et al.
Published: (2023)
mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
by: Yuan, Xu, et al.
Published: (2025)
by: Yuan, Xu, et al.
Published: (2025)
VLADriver-RAG: Retrieval-Augmented Vision-Language-Action Models for Autonomous Driving
by: Zhao, Rui, et al.
Published: (2026)
by: Zhao, Rui, et al.
Published: (2026)
Transition Matching Distillation for Fast Video Generation
by: Nie, Weili, et al.
Published: (2026)
by: Nie, Weili, et al.
Published: (2026)
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
by: Hu, Chan-Wei, et al.
Published: (2025)
by: Hu, Chan-Wei, et al.
Published: (2025)
Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation
by: Sanguigni, Fulvio, et al.
Published: (2025)
by: Sanguigni, Fulvio, et al.
Published: (2025)
FastTrackTr:Towards Fast Multi-Object Tracking with Transformers
by: Liao, Pan, et al.
Published: (2024)
by: Liao, Pan, et al.
Published: (2024)
FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval
by: Li, Zixu, et al.
Published: (2025)
by: Li, Zixu, et al.
Published: (2025)
Accurate and Fast Compressed Video Captioning
by: Shen, Yaojie, et al.
Published: (2023)
by: Shen, Yaojie, et al.
Published: (2023)
When RAG Hurts: Diagnosing and Mitigating Attention Distraction in Retrieval-Augmented LVLMs
by: Zhao, Beidi, et al.
Published: (2026)
by: Zhao, Beidi, et al.
Published: (2026)
Finer-Personalization Rank: Fine-Grained Retrieval Examines Identity Preservation for Personalized Generation
by: Kilrain, Connor, et al.
Published: (2025)
by: Kilrain, Connor, et al.
Published: (2025)
NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
by: Tanaka, Ryota, et al.
Published: (2025)
by: Tanaka, Ryota, et al.
Published: (2025)
MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
by: Hsiao, Chi-Hsiang, et al.
Published: (2025)
On Equivariance and Fast Sampling in Video Diffusion Models Trained with Warped Noise
by: Liu, Chao, et al.
Published: (2025)
by: Liu, Chao, et al.
Published: (2025)
VideoQA in the Era of LLMs: An Empirical Study
by: Xiao, Junbin, et al.
Published: (2024)
by: Xiao, Junbin, et al.
Published: (2024)
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
by: Hong, Yining, et al.
Published: (2024)
by: Hong, Yining, et al.
Published: (2024)
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)
by: Zhang, Zhengxuan, et al.
Published: (2025)
AugmenTory: A Fast and Flexible Polygon Augmentation Library
by: Ghahremani, Tanaz, et al.
Published: (2024)
by: Ghahremani, Tanaz, et al.
Published: (2024)
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
by: Xu, Haidong, et al.
Published: (2025)
by: Xu, Haidong, et al.
Published: (2025)
Fast Occupancy Network
by: Lu, Mingjie, et al.
Published: (2024)
by: Lu, Mingjie, et al.
Published: (2024)
PixelSmile: Toward Fine-Grained Facial Expression Editing
by: Hua, Jiabin, et al.
Published: (2026)
by: Hua, Jiabin, et al.
Published: (2026)
YTCommentQA: Video Question Answerability in Instructional Videos
by: Yang, Saelyne, et al.
Published: (2024)
by: Yang, Saelyne, et al.
Published: (2024)
Similar Items
-
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
by: Zeng, Nianbo, et al.
Published: (2025) -
DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation
by: Wang, Zun, et al.
Published: (2024) -
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
by: Luo, Yongdong, et al.
Published: (2024) -
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos
by: Ren, Xubin, et al.
Published: (2025) -
F$^3$Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
by: Liu, Zhaoyu, et al.
Published: (2025)