Saved in:
| Main Author: | Lin, Shih-Yao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13974 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
L-STEC: Learned Video Compression with Long-term Spatio-Temporal Enhanced Context
by: Zhang, Tiange, et al.
Published: (2025)
by: Zhang, Tiange, et al.
Published: (2025)
From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding
by: Lin, Shih-Yao, et al.
Published: (2025)
by: Lin, Shih-Yao, et al.
Published: (2025)
Metric for Evaluating Performance of Reference-Free Demorphing Methods
by: Shukla, Nitish, et al.
Published: (2025)
by: Shukla, Nitish, et al.
Published: (2025)
A Spatio-Temporal based Frame Indexing Algorithm for QoS Improvement in Live Low-Motion Video Streaming
by: Adedokun, Adewale Emmanuel, et al.
Published: (2024)
by: Adedokun, Adewale Emmanuel, et al.
Published: (2024)
Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)
by: Gu, Xin, et al.
Published: (2026)
QEVA: A Reference-Free Evaluation Metric for Narrative Video Summarization with Multimodal Question Answering
by: Jung, Woojun, et al.
Published: (2026)
by: Jung, Woojun, et al.
Published: (2026)
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
by: Yuan, Yuqian, et al.
Published: (2025)
by: Yuan, Yuqian, et al.
Published: (2025)
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)
by: Yao, Jiali, et al.
Published: (2025)
Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling
by: Zhang, Yue, et al.
Published: (2024)
by: Zhang, Yue, et al.
Published: (2024)
Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation
by: Li, Qirui, et al.
Published: (2025)
by: Li, Qirui, et al.
Published: (2025)
STGV: Spatio-Temporal Hash Encoding for Gaussian-based Video Representation
by: Lin, Jierun, et al.
Published: (2026)
by: Lin, Jierun, et al.
Published: (2026)
Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model
by: Zhang, Ruixin, et al.
Published: (2025)
by: Zhang, Ruixin, et al.
Published: (2025)
Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence
by: Yang, Shuai, et al.
Published: (2025)
by: Yang, Shuai, et al.
Published: (2025)
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
by: Kim, Pum Jun, et al.
Published: (2024)
by: Kim, Pum Jun, et al.
Published: (2024)
STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion
by: Yao, Wei, et al.
Published: (2024)
by: Yao, Wei, et al.
Published: (2024)
VISTA: Video Interaction Spatio-Temporal Analysis Benchmark
by: Aparcedo, Alejandro, et al.
Published: (2026)
by: Aparcedo, Alejandro, et al.
Published: (2026)
VideoMolmo: Spatio-Temporal Grounding Meets Pointing
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)
V-CAST: Video Curvature-Aware Spatio-Temporal Pruning for Efficient Video Large Language Models
by: Lin, Xinying, et al.
Published: (2026)
by: Lin, Xinying, et al.
Published: (2026)
Shot-Aware Frame Sampling for Video Understanding
by: Zhao, Mengyu, et al.
Published: (2026)
by: Zhao, Mengyu, et al.
Published: (2026)
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
by: Chen, Brian, et al.
Published: (2023)
by: Chen, Brian, et al.
Published: (2023)
VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion
by: Tang, Linfeng, et al.
Published: (2025)
by: Tang, Linfeng, et al.
Published: (2025)
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
by: Hyun, Jeongseok, et al.
Published: (2025)
by: Hyun, Jeongseok, et al.
Published: (2025)
STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing
by: Lee, Junsung, et al.
Published: (2025)
by: Lee, Junsung, et al.
Published: (2025)
AI-Generated Video Detection via Spatio-Temporal Anomaly Learning
by: Bai, Jianfa, et al.
Published: (2024)
by: Bai, Jianfa, et al.
Published: (2024)
Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding
by: Gao, Shida, et al.
Published: (2025)
by: Gao, Shida, et al.
Published: (2025)
No-Reference Rendered Video Quality Assessment: Dataset and Metrics
by: Yang, Sipeng, et al.
Published: (2025)
by: Yang, Sipeng, et al.
Published: (2025)
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)
by: Wasim, Syed Talal, et al.
Published: (2023)
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)
by: Cheng, Zixu, et al.
Published: (2025)
VIRST: Video-Instructed Reasoning Assistant for SpatioTemporal Segmentation
by: Hong, Jihwan, et al.
Published: (2026)
by: Hong, Jihwan, et al.
Published: (2026)
SpatioTemporal Difference Network for Video Depth Super-Resolution
by: Wang, Zhengxue, et al.
Published: (2025)
by: Wang, Zhengxue, et al.
Published: (2025)
VideoMamba: Spatio-Temporal Selective State Space Model
by: Park, Jinyoung, et al.
Published: (2024)
by: Park, Jinyoung, et al.
Published: (2024)
Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution
by: An, Hongyu, et al.
Published: (2024)
by: An, Hongyu, et al.
Published: (2024)
Video-Language Alignment via Spatio-Temporal Graph Transformer
by: Zhang, Shi-Xue, et al.
Published: (2024)
by: Zhang, Shi-Xue, et al.
Published: (2024)
Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection
by: Shen, Hao, et al.
Published: (2024)
by: Shen, Hao, et al.
Published: (2024)
Test-Time Temporal Sampling for Efficient MLLM Video Understanding
by: Wang, Kaibin, et al.
Published: (2025)
by: Wang, Kaibin, et al.
Published: (2025)
ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
by: Danier, Duolikun, et al.
Published: (2021)
by: Danier, Duolikun, et al.
Published: (2021)
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
by: Lin, Ci-Siang, et al.
Published: (2025)
by: Lin, Ci-Siang, et al.
Published: (2025)
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
by: Zhou, Xingcheng, et al.
Published: (2025)
by: Zhou, Xingcheng, et al.
Published: (2025)
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
by: Jang, Sangwon, et al.
Published: (2025)
by: Jang, Sangwon, et al.
Published: (2025)
Similar Items
-
L-STEC: Learned Video Compression with Long-term Spatio-Temporal Enhanced Context
by: Zhang, Tiange, et al.
Published: (2025) -
From Captions to Keyframes: KeyScore for Multimodal Frame Scoring and Video-Language Understanding
by: Lin, Shih-Yao, et al.
Published: (2025) -
Metric for Evaluating Performance of Reference-Free Demorphing Methods
by: Shukla, Nitish, et al.
Published: (2025) -
A Spatio-Temporal based Frame Indexing Algorithm for QoS Improvement in Live Low-Motion Video Streaming
by: Adedokun, Adewale Emmanuel, et al.
Published: (2024) -
Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)