Saved in:
| Main Author: | Nguyen, Thong Thanh |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.00683 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-Scale Contrastive Learning for Video Temporal Grounding
by: Nguyen, Thong Thanh, et al.
Published: (2024)
by: Nguyen, Thong Thanh, et al.
Published: (2024)
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
by: Nguyen, Thong, et al.
Published: (2025)
by: Nguyen, Thong, et al.
Published: (2025)
Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation
by: Nguyen, Thong Thanh, et al.
Published: (2024)
by: Nguyen, Thong Thanh, et al.
Published: (2024)
Encoding and Controlling Global Semantics for Long-form Video Question Answering
by: Nguyen, Thong Thanh, et al.
Published: (2024)
by: Nguyen, Thong Thanh, et al.
Published: (2024)
Lightweight Models for Emotional Analysis in Video
by: Nguyen, Quoc-Tien, et al.
Published: (2025)
by: Nguyen, Quoc-Tien, et al.
Published: (2025)
MOOSE: Pay Attention to Temporal Dynamics for Video Understanding via Optical Flows
by: Nguyen, Hong, et al.
Published: (2025)
by: Nguyen, Hong, et al.
Published: (2025)
Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
by: Thanh, Toan Le Ngo, et al.
Published: (2025)
by: Thanh, Toan Le Ngo, et al.
Published: (2025)
Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models
by: Cao, Tri, et al.
Published: (2026)
by: Cao, Tri, et al.
Published: (2026)
DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding
by: Nguyen, Thong, et al.
Published: (2023)
by: Nguyen, Thong, et al.
Published: (2023)
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features
by: Nguyen, Trung Thanh, et al.
Published: (2024)
by: Nguyen, Trung Thanh, et al.
Published: (2024)
MADTempo: An Interactive System for Multi-Event Temporal Video Retrieval with Query Augmentation
by: Vu, Huu-An, et al.
Published: (2025)
by: Vu, Huu-An, et al.
Published: (2025)
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
by: Yang, Min, et al.
Published: (2024)
by: Yang, Min, et al.
Published: (2024)
V-CORE: Temporally Consistent Video Understanding for Video-LLM
by: Kang, Zhengjian, et al.
Published: (2026)
by: Kang, Zhengjian, et al.
Published: (2026)
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding
by: Zhao, Henghao, et al.
Published: (2025)
by: Zhao, Henghao, et al.
Published: (2025)
BTS-rPPG: Orthogonal Butterfly Temporal Shifting for Remote Photoplethysmography
by: Nguyen, Ba-Thinh, et al.
Published: (2026)
by: Nguyen, Ba-Thinh, et al.
Published: (2026)
LensWalk: Agentic Video Understanding by Planning How You See in Videos
by: Li, Keliang, et al.
Published: (2026)
by: Li, Keliang, et al.
Published: (2026)
Seeing Through the Tool: A Controlled Benchmark for Occlusion Robustness in Foundation Segmentation Models
by: Ho, Nhan, et al.
Published: (2026)
by: Ho, Nhan, et al.
Published: (2026)
READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
by: Nguyen, Thong, et al.
Published: (2023)
by: Nguyen, Thong, et al.
Published: (2023)
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
by: Shi, Jiapeng, et al.
Published: (2026)
by: Shi, Jiapeng, et al.
Published: (2026)
Incentivizing Temporal-Awareness in Egocentric Video Understanding Models
by: Xu, Zhiyang, et al.
Published: (2026)
by: Xu, Zhiyang, et al.
Published: (2026)
Multimodal Contextualized Support for Enhancing Video Retrieval System
by: Nguyen-Le, Quoc-Bao, et al.
Published: (2024)
by: Nguyen-Le, Quoc-Bao, et al.
Published: (2024)
Understanding Machine Unlearning Through the Lens of Mode Connectivity
by: Cheng, Jiali, et al.
Published: (2025)
by: Cheng, Jiali, et al.
Published: (2025)
Video-QTR: Query-Driven Temporal Reasoning Framework for Lightweight Video Understanding
by: Zhao, Xinkui, et al.
Published: (2025)
by: Zhao, Xinkui, et al.
Published: (2025)
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
by: Zhang, Jun, et al.
Published: (2025)
by: Zhang, Jun, et al.
Published: (2025)
SoccerLens: Grounded Soccer Video Understanding Beyond Accuracy
by: Elsharkawi, Ismael, et al.
Published: (2026)
by: Elsharkawi, Ismael, et al.
Published: (2026)
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
by: Truong, Thanh-Dat, et al.
Published: (2025)
by: Truong, Thanh-Dat, et al.
Published: (2025)
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
by: Zhou, Xingcheng, et al.
Published: (2025)
by: Zhou, Xingcheng, et al.
Published: (2025)
Enhancing Temporal Understanding in Video-LLMs through Stacked Temporal Attention in Vision Encoders
by: Rasekh, Ali, et al.
Published: (2025)
by: Rasekh, Ali, et al.
Published: (2025)
Med-StepBench: A Hierarchical Reasoning Framework for Evaluating Hallucinations in Medical Vision-Language Models
by: Nguyen, Minh Khoi, et al.
Published: (2026)
by: Nguyen, Minh Khoi, et al.
Published: (2026)
EgoGraph: Temporal Knowledge Graph for Egocentric Video Understanding
by: Sun, Shitong, et al.
Published: (2026)
by: Sun, Shitong, et al.
Published: (2026)
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
by: Nguyen, Hoang-Quan, et al.
Published: (2023)
by: Nguyen, Hoang-Quan, et al.
Published: (2023)
T*: Re-thinking Temporal Search for Long-Form Video Understanding
by: Ye, Jinhui, et al.
Published: (2025)
by: Ye, Jinhui, et al.
Published: (2025)
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
by: Liu, Zichen, et al.
Published: (2025)
by: Liu, Zichen, et al.
Published: (2025)
Test-Time Temporal Sampling for Efficient MLLM Video Understanding
by: Wang, Kaibin, et al.
Published: (2025)
by: Wang, Kaibin, et al.
Published: (2025)
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
by: Nguyen, Trong-Thuan, et al.
Published: (2023)
by: Nguyen, Trong-Thuan, et al.
Published: (2023)
Q-Adapter: Visual Query Adapter for Extracting Textually-related Features in Video Captioning
by: Chen, Junan, et al.
Published: (2025)
by: Chen, Junan, et al.
Published: (2025)
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
by: Vo, Khang H. N., et al.
Published: (2025)
by: Vo, Khang H. N., et al.
Published: (2025)
FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding
by: Guo, Yanan, et al.
Published: (2025)
by: Guo, Yanan, et al.
Published: (2025)
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
by: Yang, Zhenyu, et al.
Published: (2025)
by: Yang, Zhenyu, et al.
Published: (2025)
FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation
by: Le, Minh Khoa, et al.
Published: (2026)
by: Le, Minh Khoa, et al.
Published: (2026)
Similar Items
-
Multi-Scale Contrastive Learning for Video Temporal Grounding
by: Nguyen, Thong Thanh, et al.
Published: (2024) -
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
by: Nguyen, Thong, et al.
Published: (2025) -
Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation
by: Nguyen, Thong Thanh, et al.
Published: (2024) -
Encoding and Controlling Global Semantics for Long-form Video Question Answering
by: Nguyen, Thong Thanh, et al.
Published: (2024) -
Lightweight Models for Emotional Analysis in Video
by: Nguyen, Quoc-Tien, et al.
Published: (2025)