Saved in:
| Main Authors: | Hu, Xinyi, Wang, Yuran, Zhang, Ruixu, Li, Yue, Liu, Wenxuan, Wang, Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.20189 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset
by: Zhang, Ruixu, et al.
Published: (2025)
by: Zhang, Ruixu, et al.
Published: (2025)
Anomize: Better Open Vocabulary Video Anomaly Detection
by: Li, Fei, et al.
Published: (2025)
by: Li, Fei, et al.
Published: (2025)
Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model
by: Zhang, Ruibin, et al.
Published: (2024)
by: Zhang, Ruibin, et al.
Published: (2024)
Uncertainty-Aware Token Importance Estimation in Spiking Transformers
by: Liu, Wenxuan, et al.
Published: (2026)
by: Liu, Wenxuan, et al.
Published: (2026)
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions
by: Wang, Wenxuan, et al.
Published: (2024)
by: Wang, Wenxuan, et al.
Published: (2024)
Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation
by: Wang, Yuran, et al.
Published: (2024)
by: Wang, Yuran, et al.
Published: (2024)
SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals
by: Li, Yadong, et al.
Published: (2024)
by: Li, Yadong, et al.
Published: (2024)
Bidirectional Progressive Transformer for Interaction Intention Anticipation
by: Zhang, Zichen, et al.
Published: (2024)
by: Zhang, Zichen, et al.
Published: (2024)
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather
by: Wang, Yuran, et al.
Published: (2025)
by: Wang, Yuran, et al.
Published: (2025)
SPAN: Learning Similarity between Scene Graphs and Images with Transformers
by: Cong, Yuren, et al.
Published: (2023)
by: Cong, Yuren, et al.
Published: (2023)
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
by: Wang, Yuran, et al.
Published: (2025)
by: Wang, Yuran, et al.
Published: (2025)
Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
by: Du, Jia-Run, et al.
Published: (2022)
by: Du, Jia-Run, et al.
Published: (2022)
IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning
by: Qiu, Tianheng, et al.
Published: (2025)
by: Qiu, Tianheng, et al.
Published: (2025)
Passive Non-Line-of-Sight Imaging with Light Transport Modulation
by: Zhang, Jiarui, et al.
Published: (2023)
by: Zhang, Jiarui, et al.
Published: (2023)
Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding
by: Li, Jinlin, et al.
Published: (2025)
by: Li, Jinlin, et al.
Published: (2025)
Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World
by: Wang, Yuran, et al.
Published: (2025)
by: Wang, Yuran, et al.
Published: (2025)
Video-Zero: Self-Evolution Video Understanding
by: Zhang, Ruixu, et al.
Published: (2026)
by: Zhang, Ruixu, et al.
Published: (2026)
TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching
by: Wang, Yuran, et al.
Published: (2024)
by: Wang, Yuran, et al.
Published: (2024)
Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026)
by: Wang, Fengshun, et al.
Published: (2026)
Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization
by: Li, Jiaqi, et al.
Published: (2026)
by: Li, Jiaqi, et al.
Published: (2026)
Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
by: Cui, Xing, et al.
Published: (2024)
by: Cui, Xing, et al.
Published: (2024)
Temporal Action Detection Model Compression by Progressive Block Drop
by: Chen, Xiaoyong, et al.
Published: (2025)
by: Chen, Xiaoyong, et al.
Published: (2025)
Spatio-Temporal Progressive Attention Model for EEG Classification in Rapid Serial Visual Presentation Task
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
by: Cheng, Xinhua, et al.
Published: (2023)
by: Cheng, Xinhua, et al.
Published: (2023)
DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
UHD-GPGNet: UHD Video Denoising via Gaussian-Process-Guided Local Spatio-Temporal Modeling
by: He, Weiyuan, et al.
Published: (2026)
by: He, Weiyuan, et al.
Published: (2026)
Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action Localization
by: Su, Rui, et al.
Published: (2019)
by: Su, Rui, et al.
Published: (2019)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
by: Gao, Ning, et al.
Published: (2024)
by: Gao, Ning, et al.
Published: (2024)
Temporal Action Localization with Cross Layer Task Decoupling and Refinement
by: Li, Qiang, et al.
Published: (2024)
by: Li, Qiang, et al.
Published: (2024)
Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better
by: Li, Ruojing, et al.
Published: (2025)
by: Li, Ruojing, et al.
Published: (2025)
PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2023)
by: Wang, Liuyi, et al.
Published: (2023)
CaTFormer: Causal Temporal Transformer with Dynamic Contextual Fusion for Driving Intention Prediction
by: Wang, Sirui, et al.
Published: (2025)
by: Wang, Sirui, et al.
Published: (2025)
EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
by: Ma, Junyi, et al.
Published: (2025)
by: Ma, Junyi, et al.
Published: (2025)
Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation
by: Wang, Wenxuan, et al.
Published: (2023)
by: Wang, Wenxuan, et al.
Published: (2023)
Word-Anchored Temporal Forgery Localization
by: Wang, Tianyi, et al.
Published: (2026)
by: Wang, Tianyi, et al.
Published: (2026)
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
by: Cao, Congqi, et al.
Published: (2025)
by: Cao, Congqi, et al.
Published: (2025)
Interactive Multimodal Fusion with Temporal Modeling
by: Yu, Jun, et al.
Published: (2025)
by: Yu, Jun, et al.
Published: (2025)
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
Similar Items
-
Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset
by: Zhang, Ruixu, et al.
Published: (2025) -
Anomize: Better Open Vocabulary Video Anomaly Detection
by: Li, Fei, et al.
Published: (2025) -
Towards Dense and Accurate Radar Perception Via Efficient Cross-Modal Diffusion Model
by: Zhang, Ruibin, et al.
Published: (2024) -
Uncertainty-Aware Token Importance Estimation in Spiking Transformers
by: Liu, Wenxuan, et al.
Published: (2026) -
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions
by: Wang, Wenxuan, et al.
Published: (2024)