:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sun, Guanxiong, Hua, Yang, Hu, Guosheng, Robertson, Neil
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2402.09257
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency
by: Sun, Guanxiong, et al.
Published: (2024)

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection
by: Sun, Guanxiong, et al.
Published: (2024)

Spatio-temporal Prompting Network for Robust Video Feature Extraction
by: Sun, Guanxiong, et al.
Published: (2024)

FTDMamba: Frequency-Assisted Temporal Dilation Mamba for Unmanned Aerial Vehicle Video Anomaly Detection
by: Liu, Cheng-Zhuang, et al.
Published: (2026)

Sparse-Dense Side-Tuner for efficient Video Temporal Grounding
by: Pujol-Perich, David, et al.
Published: (2025)

Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning
by: Xie, Zhuyang, et al.
Published: (2024)

Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
by: Yang, Min, et al.
Published: (2024)

Moment Quantization for Video Temporal Grounding
by: Sun, Xiaolong, et al.
Published: (2025)

Unified Dense Prediction of Video Diffusion
by: Yang, Lehan, et al.
Published: (2025)

Self-Diffusion Driven Blind Imaging
by: Yang, Yanlong, et al.
Published: (2025)

STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution
by: Chen, Junyang, et al.
Published: (2025)

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
by: Wu, Ziyi, et al.
Published: (2025)

Number it: Temporal Grounding Videos like Flipping Manga
by: Wu, Yongliang, et al.
Published: (2024)

Dense Video Object Captioning from Disjoint Supervision
by: Zhou, Xingyi, et al.
Published: (2023)

Streaming Dense Video Captioning
by: Zhou, Xingyi, et al.
Published: (2024)

Task Indicating Transformer for Task-conditional Dense Predictions
by: Lu, Yuxiang, et al.
Published: (2024)

Video-Language Alignment via Spatio-Temporal Graph Transformer
by: Zhang, Shi-Xue, et al.
Published: (2024)

DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
by: Zhong, Xiaojing, et al.
Published: (2024)

TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
by: Kong, Fanheng, et al.
Published: (2025)

Emergent Temporal Correspondences from Video Diffusion Transformers
by: Nam, Jisu, et al.
Published: (2025)

VideoCoF: Unified Video Editing with Temporal Reasoner
by: Yang, Xiangpeng, et al.
Published: (2025)

Dynamic View Synthesis from Small Camera Motion Videos
by: Sun, Huiqiang, et al.
Published: (2025)

Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
by: Xi, Haocheng, et al.
Published: (2025)

DeVAn: Dense Video Annotation for Video-Language Models
by: Liu, Tingkai, et al.
Published: (2023)

Technical Report for Soccernet 2023 -- Dense Video Captioning
by: Ruan, Zheng, et al.
Published: (2024)

VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
by: Lee, Ji Soo, et al.
Published: (2025)

TemporalVLM: Video LLMs for Temporal Reasoning in Long Videos
by: Fateh, Fawad Javed, et al.
Published: (2024)

TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors
by: Cheng, Wei-Yuan, et al.
Published: (2026)

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos
by: Jiang, Songtao, et al.
Published: (2026)

Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos
by: Luo, Xianrui, et al.
Published: (2025)

SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion
by: Ma, Junxian, et al.
Published: (2025)

SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse
by: Sun, Yiming, et al.
Published: (2025)

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)

Splatter a Video: Video Gaussian Representation for Versatile Processing
by: Sun, Yang-Tian, et al.
Published: (2024)

VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
by: Wang, Shaobo, et al.
Published: (2025)

Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction
by: Jia, Mingda, et al.
Published: (2025)

TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
by: Yang, Zuhao, et al.
Published: (2025)

Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing
by: Lee, Cheng-Han, et al.
Published: (2026)

Adaptive Dense Evidence Refinement for Video Relational Reasoning for VRR-QA Challenge
by: Sun, Yuyang, et al.
Published: (2026)

Described Spatial-Temporal Video Detection
by: Ji, Wei, et al.
Published: (2024)