:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Xinyi, Wang, Yuran, Zhang, Ruixu, Li, Yue, Liu, Wenxuan, Wang, Zheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.20189
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset
by: Zhang, Ruixu, et al.
Published: (2025)

Anomize: Better Open Vocabulary Video Anomaly Detection
by: Li, Fei, et al.
Published: (2025)

Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model
by: Zhang, Ruibin, et al.
Published: (2024)

Uncertainty-Aware Token Importance Estimation in Spiking Transformers
by: Liu, Wenxuan, et al.
Published: (2026)

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions
by: Wang, Wenxuan, et al.
Published: (2024)

Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation
by: Wang, Yuran, et al.
Published: (2024)

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection
by: Wang, Yifan, et al.
Published: (2025)

IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals
by: Li, Yadong, et al.
Published: (2024)

Bidirectional Progressive Transformer for Interaction Intention Anticipation
by: Zhang, Zichen, et al.
Published: (2024)

RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather
by: Wang, Yuran, et al.
Published: (2025)

SPAN: Learning Similarity between Scene Graphs and Images with Transformers
by: Cong, Yuren, et al.
Published: (2023)

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
by: Wang, Yuran, et al.
Published: (2025)

Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
by: Du, Jia-Run, et al.
Published: (2022)

IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning
by: Qiu, Tianheng, et al.
Published: (2025)

Passive Non-Line-of-Sight Imaging with Light Transport Modulation
by: Zhang, Jiarui, et al.
Published: (2023)

Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding
by: Li, Jinlin, et al.
Published: (2025)

Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World
by: Wang, Yuran, et al.
Published: (2025)

Video-Zero: Self-Evolution Video Understanding
by: Zhang, Ruixu, et al.
Published: (2026)

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
by: Liu, Yang, et al.
Published: (2025)

Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching
by: Wang, Yuran, et al.
Published: (2024)

Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026)

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization
by: Li, Jiaqi, et al.
Published: (2026)

Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
by: Cui, Xing, et al.
Published: (2024)

Temporal Action Detection Model Compression by Progressive Block Drop
by: Chen, Xiaoyong, et al.
Published: (2025)

Spatio-Temporal Progressive Attention Model for EEG Classification in Rapid Serial Visual Presentation Task
by: Li, Yang, et al.
Published: (2025)

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
by: Cheng, Xinhua, et al.
Published: (2023)

DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention
by: Liu, Yang, et al.
Published: (2024)

UHD-GPGNet: UHD Video Denoising via Gaussian-Process-Guided Local Spatio-Temporal Modeling
by: He, Weiyuan, et al.
Published: (2026)

Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization
by: Su, Rui, et al.
Published: (2019)

PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
by: Gao, Ning, et al.
Published: (2024)

Temporal Action Localization with Cross Layer Task Decoupling and Refinement
by: Li, Qiang, et al.
Published: (2024)

Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better
by: Li, Ruojing, et al.
Published: (2025)

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2023)

CaTFormer: Causal Temporal Transformer with Dynamic Contextual Fusion for Driving Intention Prediction
by: Wang, Sirui, et al.
Published: (2025)

EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
by: Ma, Junyi, et al.
Published: (2025)

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation
by: Wang, Wenxuan, et al.
Published: (2023)

Word-Anchored Temporal Forgery Localization
by: Wang, Tianyi, et al.
Published: (2026)

Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
by: Cao, Congqi, et al.
Published: (2025)

Interactive Multimodal Fusion with Temporal Modeling
by: Yu, Jun, et al.
Published: (2025)

Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
by: Wang, Yuxuan, et al.
Published: (2024)