:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Tang, Canhui, Han, Zifan, Sun, Hongbo, Zhou, Sanping, Zhang, Xuchong, Wei, Xin, Yuan, Ye, Zhang, Huayu, Xu, Jinglin, Sun, Hao
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2508.04369
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

WAT: Online Video Understanding Needs Watching Before Thinking
di: Han, Zifan, et al.
Pubblicazione: (2026)

Action Hints: Semantic Typicality and Context Uniqueness for Generalizable Skeleton-based Video Anomaly Detection
di: Tang, Canhui, et al.
Pubblicazione: (2025)

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection
di: Tang, Canhui, et al.
Pubblicazione: (2024)

Object-fabrication Targeted Attack for Object Detection
di: Zhang, Xuchong, et al.
Pubblicazione: (2022)

RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension
di: Gao, Tianyi, et al.
Pubblicazione: (2025)

Latent Feature and Attention Dual Erasure Attack against Multi-View Diffusion Models for 3D Assets Protection
di: Sun, Jingwei, et al.
Pubblicazione: (2024)

TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization
di: Ma, Shichao, et al.
Pubblicazione: (2026)

Adaptive Keyframe Sampling for Long Video Understanding
di: Tang, Xi, et al.
Pubblicazione: (2025)

Segment-Aligned Policy Optimization for Multi-Modal Reasoning
di: Gao, Lei, et al.
Pubblicazione: (2026)

Temporal Preference Optimization for Long-Form Video Understanding
di: Li, Rui, et al.
Pubblicazione: (2025)

Moment Quantization for Video Temporal Grounding
di: Sun, Xiaolong, et al.
Pubblicazione: (2025)

Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
di: Tang, Zitian, et al.
Pubblicazione: (2023)

VideoAgent: Long-form Video Understanding with Large Language Model as Agent
di: Wang, Xiaohan, et al.
Pubblicazione: (2024)

Learning Compact Video Representations for Efficient Long-form Video Understanding in Large Multimodal Models
di: Chen, Yuxiao, et al.
Pubblicazione: (2026)

Video Token Merging for Long-form Video Understanding
di: Lee, Seon-Ho, et al.
Pubblicazione: (2024)

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
di: Liu, Ruyang, et al.
Pubblicazione: (2025)

P^2O: Joint Policy and Prompt Optimization
di: Lu, Xinyu, et al.
Pubblicazione: (2026)

T*: Re-thinking Temporal Search for Long-Form Video Understanding
di: Ye, Jinhui, et al.
Pubblicazione: (2025)

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization
di: Jia, Chenwei, et al.
Pubblicazione: (2026)

VideoMem: Enhancing Ultra-Long Video Understanding via Adaptive Memory Management
di: Jin, Hongbo, et al.
Pubblicazione: (2025)

Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation
di: Niu, Ye, et al.
Pubblicazione: (2025)

Effective Message Hiding with Order-Preserving Mechanisms
di: Yu, Gao, et al.
Pubblicazione: (2024)

Multimodal Long Video Modeling Based on Temporal Dynamic Context
di: Hao, Haoran, et al.
Pubblicazione: (2025)

Offline Policy Optimization with Posterior Sampling
di: Lin, Hongqiang, et al.
Pubblicazione: (2026)

TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
di: Pan, Junwen, et al.
Pubblicazione: (2025)

Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
di: Han, Yizhao, et al.
Pubblicazione: (2026)

Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference
di: He, Zifan, et al.
Pubblicazione: (2026)

RKKY-like interactions between two magnetic skyrmions
di: Hu, Xuchong, et al.
Pubblicazione: (2026)

FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
di: Wang, Sen, et al.
Pubblicazione: (2025)

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding
di: Yuan, Liping, et al.
Pubblicazione: (2025)

Unleashing Hour-Scale Video Training for Long Video-Language Understanding
di: Lin, Jingyang, et al.
Pubblicazione: (2025)

Towards Long-Form Spatio-Temporal Video Grounding
di: Gu, Xin, et al.
Pubblicazione: (2026)

Iterative Zoom-In: Temporal Interval Exploration for Long Video Understanding
di: Li, Chenglin, et al.
Pubblicazione: (2025)

Beyond Importance Sampling: Rejection-Gated Policy Optimization
di: Sun, Ziwu, et al.
Pubblicazione: (2026)

TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
di: Pan, Junwen, et al.
Pubblicazione: (2025)

EgoGraph: Temporal Knowledge Graph for Egocentric Video Understanding
di: Sun, Shitong, et al.
Pubblicazione: (2026)

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
di: Xin, Lei, et al.
Pubblicazione: (2026)

V-CORE: Temporally Consistent Video Understanding for Video-LLM
di: Kang, Zhengjian, et al.
Pubblicazione: (2026)

Sample and Communication Efficient Fully Decentralized MARL Policy Evaluation via a New Approach: Local TD update
di: Hairi, Fnu, et al.
Pubblicazione: (2024)

Luminescence and Thermal Stability of Dy 3+ Doped La 0.1 Y 1.9 WO 6 Phosphors
di: Haochang Ye, et al.
Pubblicazione: (2025)