:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Yang, Guo, Sheng, Zheng, Bo, Wang, Limin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.21197
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
by: Yang, Min, et al.
Published: (2023)

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
by: Wang, Xiaofeng, et al.
Published: (2024)

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)

PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
by: Wang, Hanlin, et al.
Published: (2023)

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
by: Yang, Min, et al.
Published: (2025)

Action100M: A Large-scale Video Action Dataset
by: Chen, Delong, et al.
Published: (2026)

Condensing Action Segmentation Datasets via Generative Network Inversion
by: Ding, Guodong, et al.
Published: (2025)

Video Dataset Condensation with Diffusion Models
by: Li, Zhe, et al.
Published: (2025)

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
by: Wang, Yi, et al.
Published: (2023)

VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
by: Yadav, Tanush, et al.
Published: (2026)

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
by: Wang, Jiahao, et al.
Published: (2025)

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2026)

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
by: Li, Zhen, et al.
Published: (2026)

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
by: Wu, Tao, et al.
Published: (2024)

UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining
by: Yang, Pei, et al.
Published: (2026)

Multisize Dataset Condensation
by: He, Yang, et al.
Published: (2024)

MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization
by: Chen, Shimin, et al.
Published: (2022)

UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation
by: Zhao, Hongshen, et al.
Published: (2026)

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations
by: Wang, Yunnan, et al.
Published: (2026)

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
by: Chen, Jiaben, et al.
Published: (2025)

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
by: Wang, Wenhao, et al.
Published: (2025)

Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
by: Yang, Min, et al.
Published: (2024)

ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving
by: Guo, Xianda, et al.
Published: (2025)

AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation
by: Si, Chen, et al.
Published: (2026)

Sparse Global Matching for Video Frame Interpolation with Large Motion
by: Liu, Chunxu, et al.
Published: (2024)

MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models
by: Guo, Zile, et al.
Published: (2026)

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
by: Wang, Yi, et al.
Published: (2024)

StageInteractor: Query-based Object Detector with Cross-stage Interaction
by: Teng, Yao, et al.
Published: (2023)

VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations
by: Qiao, Qianqian, et al.
Published: (2025)

MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection
by: Gao, Shuyong, et al.
Published: (2025)

TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On
by: Shao, Dingbao, et al.
Published: (2026)

Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)

Knowledge Condensation and Reasoning for Knowledge-based VQA
by: Hao, Dongze, et al.
Published: (2024)

HabitAction: A Video Dataset for Human Habitual Behavior Recognition
by: Li, Hongwu, et al.
Published: (2024)

DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving
by: Shi, Chen, et al.
Published: (2026)

BACON: Bayesian Optimal Condensation Framework for Dataset Distillation
by: Zhou, Zheng, et al.
Published: (2024)

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
by: Ju, Xuan, et al.
Published: (2024)

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation
by: Di, Donglin, et al.
Published: (2024)

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
by: Li, Hui, et al.
Published: (2024)

OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
by: He, Haoyang, et al.
Published: (2025)