Saved in:
| Main Authors: | Chen, Yang, Guo, Sheng, Zheng, Bo, Wang, Limin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.21197 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
by: Yang, Min, et al.
Published: (2023)
by: Yang, Min, et al.
Published: (2023)
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
by: Wang, Xiaofeng, et al.
Published: (2024)
by: Wang, Xiaofeng, et al.
Published: (2024)
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
by: Wang, Hanlin, et al.
Published: (2023)
by: Wang, Hanlin, et al.
Published: (2023)
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
by: Yang, Min, et al.
Published: (2025)
by: Yang, Min, et al.
Published: (2025)
Action100M: A Large-scale Video Action Dataset
by: Chen, Delong, et al.
Published: (2026)
by: Chen, Delong, et al.
Published: (2026)
Condensing Action Segmentation Datasets via Generative Network Inversion
by: Ding, Guodong, et al.
Published: (2025)
by: Ding, Guodong, et al.
Published: (2025)
Video Dataset Condensation with Diffusion Models
by: Li, Zhe, et al.
Published: (2025)
by: Li, Zhe, et al.
Published: (2025)
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
by: Wang, Yi, et al.
Published: (2023)
by: Wang, Yi, et al.
Published: (2023)
VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
by: Yadav, Tanush, et al.
Published: (2026)
by: Yadav, Tanush, et al.
Published: (2026)
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
by: Wang, Jiahao, et al.
Published: (2025)
by: Wang, Jiahao, et al.
Published: (2025)
OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2026)
by: Zheng, Minghang, et al.
Published: (2026)
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
by: Li, Zhen, et al.
Published: (2026)
by: Li, Zhen, et al.
Published: (2026)
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining
by: Yang, Pei, et al.
Published: (2026)
by: Yang, Pei, et al.
Published: (2026)
Multisize Dataset Condensation
by: He, Yang, et al.
Published: (2024)
by: He, Yang, et al.
Published: (2024)
MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization
by: Chen, Shimin, et al.
Published: (2022)
by: Chen, Shimin, et al.
Published: (2022)
UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation
by: Zhao, Hongshen, et al.
Published: (2026)
by: Zhao, Hongshen, et al.
Published: (2026)
SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations
by: Wang, Yunnan, et al.
Published: (2026)
by: Wang, Yunnan, et al.
Published: (2026)
TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
by: Chen, Jiaben, et al.
Published: (2025)
by: Chen, Jiaben, et al.
Published: (2025)
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
by: Wang, Wenhao, et al.
Published: (2025)
by: Wang, Wenhao, et al.
Published: (2025)
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
by: Yang, Min, et al.
Published: (2024)
by: Yang, Min, et al.
Published: (2024)
ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving
by: Guo, Xianda, et al.
Published: (2025)
by: Guo, Xianda, et al.
Published: (2025)
AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation
by: Si, Chen, et al.
Published: (2026)
by: Si, Chen, et al.
Published: (2026)
Sparse Global Matching for Video Frame Interpolation with Large Motion
by: Liu, Chunxu, et al.
Published: (2024)
by: Liu, Chunxu, et al.
Published: (2024)
MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models
by: Guo, Zile, et al.
Published: (2026)
by: Guo, Zile, et al.
Published: (2026)
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
by: Wang, Yi, et al.
Published: (2024)
by: Wang, Yi, et al.
Published: (2024)
StageInteractor: Query-based Object Detector with Cross-stage Interaction
by: Teng, Yao, et al.
Published: (2023)
by: Teng, Yao, et al.
Published: (2023)
VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations
by: Qiao, Qianqian, et al.
Published: (2025)
by: Qiao, Qianqian, et al.
Published: (2025)
MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection
by: Gao, Shuyong, et al.
Published: (2025)
by: Gao, Shuyong, et al.
Published: (2025)
TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On
by: Shao, Dingbao, et al.
Published: (2026)
by: Shao, Dingbao, et al.
Published: (2026)
Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)
by: Yin, Yufei, et al.
Published: (2026)
Knowledge Condensation and Reasoning for Knowledge-based VQA
by: Hao, Dongze, et al.
Published: (2024)
by: Hao, Dongze, et al.
Published: (2024)
HabitAction: A Video Dataset for Human Habitual Behavior Recognition
by: Li, Hongwu, et al.
Published: (2024)
by: Li, Hongwu, et al.
Published: (2024)
DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving
by: Shi, Chen, et al.
Published: (2026)
by: Shi, Chen, et al.
Published: (2026)
BACON: Bayesian Optimal Condensation Framework for Dataset Distillation
by: Zhou, Zheng, et al.
Published: (2024)
by: Zhou, Zheng, et al.
Published: (2024)
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
by: Ju, Xuan, et al.
Published: (2024)
by: Ju, Xuan, et al.
Published: (2024)
DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation
by: Di, Donglin, et al.
Published: (2024)
by: Di, Donglin, et al.
Published: (2024)
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
by: Li, Hui, et al.
Published: (2024)
by: Li, Hui, et al.
Published: (2024)
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
by: He, Haoyang, et al.
Published: (2025)
by: He, Haoyang, et al.
Published: (2025)
Similar Items
-
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
by: Yang, Min, et al.
Published: (2023) -
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
by: Wang, Xiaofeng, et al.
Published: (2024) -
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
by: Zhou, Jiaming, et al.
Published: (2024) -
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
by: Wang, Hanlin, et al.
Published: (2023) -
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
by: Yang, Min, et al.
Published: (2025)