Saved in:
| Main Authors: | Qiu, Tianheng, Gao, Jingchun, Li, Jingyu, Leong, Huiyi, Huang, Xuan, Wang, Xi, Zhang, Xiaocheng, Xu, Kele, Zhang, Lan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.18531 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
by: Tu, Xuezhen, et al.
Published: (2026)
by: Tu, Xuezhen, et al.
Published: (2026)
The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
by: Hill, Brennen A., et al.
Published: (2025)
by: Hill, Brennen A., et al.
Published: (2025)
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
by: Fu, Honghao, et al.
Published: (2026)
by: Fu, Honghao, et al.
Published: (2026)
MIAT: Maneuver-Intention-Aware Transformer for Spatio-Temporal Trajectory Prediction
by: Raskoti, Chandra, et al.
Published: (2025)
by: Raskoti, Chandra, et al.
Published: (2025)
CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
by: Fiastre, Gabriel, et al.
Published: (2025)
by: Fiastre, Gabriel, et al.
Published: (2025)
ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos
by: Xu, Qi'ao, et al.
Published: (2025)
by: Xu, Qi'ao, et al.
Published: (2025)
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
by: Nadeem, Asmar, et al.
Published: (2024)
by: Nadeem, Asmar, et al.
Published: (2024)
Scene Graph-guided SegCaptioning Transformer with Fine-grained Alignment for Controllable Video Segmentation and Captioning
by: Zhang, Xu, et al.
Published: (2026)
by: Zhang, Xu, et al.
Published: (2026)
Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning
by: Yang, Jingchun, et al.
Published: (2026)
by: Yang, Jingchun, et al.
Published: (2026)
SOVC: Subject-Oriented Video Captioning
by: Teng, Chang, et al.
Published: (2023)
by: Teng, Chang, et al.
Published: (2023)
DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration
by: Chen, Zheng, et al.
Published: (2026)
by: Chen, Zheng, et al.
Published: (2026)
Dual-path Collaborative Generation Network for Emotional Video Captioning
by: Ye, Cheng, et al.
Published: (2024)
by: Ye, Cheng, et al.
Published: (2024)
OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios
by: Gao, Hong, et al.
Published: (2025)
by: Gao, Hong, et al.
Published: (2025)
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
by: Liang, Lili, et al.
Published: (2024)
by: Liang, Lili, et al.
Published: (2024)
PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2023)
by: Wang, Liuyi, et al.
Published: (2023)
Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
by: Cheng, Yi, et al.
Published: (2024)
by: Cheng, Yi, et al.
Published: (2024)
HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking
by: Deng, Yao, et al.
Published: (2025)
by: Deng, Yao, et al.
Published: (2025)
FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding
by: Guo, Yanan, et al.
Published: (2025)
by: Guo, Yanan, et al.
Published: (2025)
Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction
by: Qian, Lin, et al.
Published: (2026)
by: Qian, Lin, et al.
Published: (2026)
Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)
by: Gu, Xin, et al.
Published: (2026)
Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution
by: An, Hongyu, et al.
Published: (2024)
by: An, Hongyu, et al.
Published: (2024)
Consistent multiple-relaxation-time lattice Boltzmann method for the volume averaged Navier-Stokes equations
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning
by: Gao, Zijun, et al.
Published: (2025)
by: Gao, Zijun, et al.
Published: (2025)
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
by: Fei, Hao, et al.
Published: (2024)
by: Fei, Hao, et al.
Published: (2024)
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
by: Sarto, Sara, et al.
Published: (2024)
by: Sarto, Sara, et al.
Published: (2024)
Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning
by: Deng, Jiewen, et al.
Published: (2024)
by: Deng, Jiewen, et al.
Published: (2024)
Video-Language Alignment via Spatio-Temporal Graph Transformer
by: Zhang, Shi-Xue, et al.
Published: (2024)
by: Zhang, Shi-Xue, et al.
Published: (2024)
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
Bridging the Gap between User Intent and LLM: A Requirement Alignment Approach for Code Generation
by: Li, Jia, et al.
Published: (2026)
by: Li, Jia, et al.
Published: (2026)
Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain
by: Chao, Lianying, et al.
Published: (2026)
by: Chao, Lianying, et al.
Published: (2026)
Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
by: Wang, Wentao, et al.
Published: (2025)
by: Wang, Wentao, et al.
Published: (2025)
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
by: Gao, Xin, et al.
Published: (2023)
by: Gao, Xin, et al.
Published: (2023)
Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection
by: Shen, Hao, et al.
Published: (2024)
by: Shen, Hao, et al.
Published: (2024)
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
by: Yanuka, Moran, et al.
Published: (2024)
by: Yanuka, Moran, et al.
Published: (2024)
Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
by: Xing, Yun, et al.
Published: (2023)
by: Xing, Yun, et al.
Published: (2023)
Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
by: Xu, Zhou, et al.
Published: (2026)
by: Xu, Zhou, et al.
Published: (2026)
LINR Bridge: Vector Graphic Animation via Neural Implicits and Video Diffusion Priors
by: Gao, Wenshuo, et al.
Published: (2025)
by: Gao, Wenshuo, et al.
Published: (2025)
Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation
by: Li, Qirui, et al.
Published: (2025)
by: Li, Qirui, et al.
Published: (2025)
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)
by: Yao, Jiali, et al.
Published: (2025)
Similar Items
-
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
by: Tu, Xuezhen, et al.
Published: (2026) -
The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
by: Hill, Brennen A., et al.
Published: (2025) -
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
by: Fu, Honghao, et al.
Published: (2026) -
MIAT: Maneuver-Intention-Aware Transformer for Spatio-Temporal Trajectory Prediction
by: Raskoti, Chandra, et al.
Published: (2025) -
CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
by: Fiastre, Gabriel, et al.
Published: (2025)