:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Samel, Karan, Sontakke, Nitish, Essa, Irfan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.17352
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring Efficient Foundational Multi-modal Models for Video Summarization
by: Samel, Karan, et al.
Published: (2024)

On the Efficacy of Text-Based Input Modalities for Action Anticipation
by: Beedu, Apoorva, et al.
Published: (2024)

HierSum: A Global and Local Attention Mechanism for Video Summarization
by: Beedu, Apoorva, et al.
Published: (2025)

SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping
by: Cartillier, Vincent, et al.
Published: (2024)

3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D
by: Cartillier, Vincent, et al.
Published: (2024)

Efficient Pre-training for Localized Instruction Generation of Videos
by: Batra, Anil, et al.
Published: (2023)

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
by: Nagasinghe, Kumaranage Ravindu Yasas, et al.
Published: (2024)

Open-Event Procedure Planning in Instructional Videos
by: Wu, Yilu, et al.
Published: (2024)

MoCHA: Denoising Caption Supervision for Motion-Text Retrieval
by: Warner, Nikolai, et al.
Published: (2026)

Learning Procedural-aware Video Representations through State-Grounded Hierarchy Unfolding
by: Zhao, Jinghan, et al.
Published: (2025)

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos
by: Seminara, Luigi, et al.
Published: (2026)

UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models
by: Chen, Lan, et al.
Published: (2025)

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
by: Marmon, Andrew, et al.
Published: (2024)

Predicting Implicit Arguments in Procedural Video Instructions
by: Batra, Anil, et al.
Published: (2025)

RECIPE: Procedural Planning via Grounding in Instructional Video
by: Seminara, Luigi, et al.
Published: (2026)

PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
by: Wang, Hanlin, et al.
Published: (2023)

Mamba Fusion: Learning Actions Through Questioning
by: Dong, Zhikang, et al.
Published: (2024)

SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction
by: Son, Sumin, et al.
Published: (2024)

Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos
by: Zhou, Yufan, et al.
Published: (2025)

VELVET-Med: Vision and Efficient Language Pre-training for Volumetric Imaging Tasks in Medicine
by: Zhang, Ziyang, et al.
Published: (2025)

Towards Data-Efficient Video Pre-training with Frozen Image Foundation Models
by: Orlova, Svetlana, et al.
Published: (2026)

Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals
by: Wu, Te-Lin, et al.
Published: (2021)

Contrastive Language Video Time Pre-training
by: Liu, Hengyue, et al.
Published: (2024)

EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training
by: Tian, Qingyao, et al.
Published: (2025)

Learning Complex Non-Rigid Image Edits from Multimodal Conditioning
by: Warner, Nikolai, et al.
Published: (2024)

Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification
by: Sobuj, Md. Shohanur Islam, et al.
Published: (2024)

AugLift: Depth-Aware Input Reparameterization Improves Domain Generalization in 2D-to-3D Pose Lifting
by: Warner, Nikolai, et al.
Published: (2025)

LAP: A Language-Aware Planning Model For Procedure Planning In Instructional Videos
by: Shi, Lei, et al.
Published: (2026)

ActionDiffusion: An Action-aware Diffusion Model for Procedure Planning in Instructional Videos
by: Shi, Lei, et al.
Published: (2024)

Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
by: Fernando, Basura, et al.
Published: (2025)

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
by: Dong, Yuhao, et al.
Published: (2026)

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
by: Tang, Longxiang, et al.
Published: (2024)

Generic Knowledge Boosted Pre-training For Remote Sensing Images
by: Huang, Ziyue, et al.
Published: (2024)

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks
by: Dong, Xingning, et al.
Published: (2024)

Less is More: Label-Guided Summarization of Procedural and Instructional Videos
by: Rajpal, Shreya, et al.
Published: (2026)

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
by: Yuan, Kun, et al.
Published: (2024)

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation
by: Chen, Jingxi, et al.
Published: (2024)

ProcObject-10K: Benchmarking Object-Centric Procedural Understanding in Instructional Videos
by: Guo, Wenliang, et al.
Published: (2025)

Temporal-Consistent Video Restoration with Pre-trained Diffusion Models
by: Wang, Hengkang, et al.
Published: (2025)

Large-scale Pre-training for Grounded Video Caption Generation
by: Kazakos, Evangelos, et al.
Published: (2025)