:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Lin, Shih-Yao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.13974
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

L-STEC: Learned Video Compression with Long-term Spatio-Temporal Enhanced Context
by: Zhang, Tiange, et al.
Published: (2025)

From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding
by: Lin, Shih-Yao, et al.
Published: (2025)

Metric for Evaluating Performance of Reference-Free Demorphing Methods
by: Shukla, Nitish, et al.
Published: (2025)

A Spatio-Temporal based Frame Indexing Algorithm for QoS Improvement in Live Low-Motion Video Streaming
by: Adedokun, Adewale Emmanuel, et al.
Published: (2024)

Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)

QEVA: A Reference-Free Evaluation Metric for Narrative Video Summarization with Multimodal Question Answering
by: Jung, Woojun, et al.
Published: (2026)

PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
by: Yuan, Yuqian, et al.
Published: (2025)

OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)

Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)

MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling
by: Zhang, Yue, et al.
Published: (2024)

Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation
by: Li, Qirui, et al.
Published: (2025)

STGV: Spatio-Temporal Hash Encoding for Gaussian-based Video Representation
by: Lin, Jierun, et al.
Published: (2026)

Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model
by: Zhang, Ruixin, et al.
Published: (2025)

Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence
by: Yang, Shuai, et al.
Published: (2025)

STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
by: Kim, Pum Jun, et al.
Published: (2024)

STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion
by: Yao, Wei, et al.
Published: (2024)

VISTA: Video Interaction Spatio-Temporal Analysis Benchmark
by: Aparcedo, Alejandro, et al.
Published: (2026)

VideoMolmo: Spatio-Temporal Grounding Meets Pointing
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)

V-CAST: Video Curvature-Aware Spatio-Temporal Pruning for Efficient Video Large Language Models
by: Lin, Xinying, et al.
Published: (2026)

Shot-Aware Frame Sampling for Video Understanding
by: Zhao, Mengyu, et al.
Published: (2026)

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
by: Chen, Brian, et al.
Published: (2023)

VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion
by: Tang, Linfeng, et al.
Published: (2025)

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
by: Hyun, Jeongseok, et al.
Published: (2025)

STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing
by: Lee, Junsung, et al.
Published: (2025)

AI-Generated Video Detection via Spatio-Temporal Anomaly Learning
by: Bai, Jianfa, et al.
Published: (2024)

Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding
by: Gao, Shida, et al.
Published: (2025)

No-Reference Rendered Video Quality Assessment: Dataset and Metrics
by: Yang, Sipeng, et al.
Published: (2025)

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)

VIRST: Video-Instructed Reasoning Assistant for SpatioTemporal Segmentation
by: Hong, Jihwan, et al.
Published: (2026)

SpatioTemporal Difference Network for Video Depth Super-Resolution
by: Wang, Zhengxue, et al.
Published: (2025)

VideoMamba: Spatio-Temporal Selective State Space Model
by: Park, Jinyoung, et al.
Published: (2024)

Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution
by: An, Hongyu, et al.
Published: (2024)

Video-Language Alignment via Spatio-Temporal Graph Transformer
by: Zhang, Shi-Xue, et al.
Published: (2024)

Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection
by: Shen, Hao, et al.
Published: (2024)

Test-Time Temporal Sampling for Efficient MLLM Video Understanding
by: Wang, Kaibin, et al.
Published: (2025)

ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
by: Danier, Duolikun, et al.
Published: (2021)

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
by: Lin, Ci-Siang, et al.
Published: (2025)

TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
by: Zhou, Xingcheng, et al.
Published: (2025)

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)