Saved in:
| Main Authors: | Wang, Bin, Li, Wentong, Wang, Wenqian, Gao, Mingliang, Cong, Runmin, Zhang, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.10688 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models
by: Wang, Bin, et al.
Published: (2025)
by: Wang, Bin, et al.
Published: (2025)
Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation
by: Cong, Runmin, et al.
Published: (2025)
by: Cong, Runmin, et al.
Published: (2025)
EZ-CLIP: Efficient Zeroshot Video Action Recognition
by: Ahmad, Shahzad, et al.
Published: (2023)
by: Ahmad, Shahzad, et al.
Published: (2023)
CLIP-guided Prototype Modulating for Few-shot Action Recognition
by: Wang, Xiang, et al.
Published: (2023)
by: Wang, Xiang, et al.
Published: (2023)
MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors
by: Huang, Binhua, et al.
Published: (2025)
by: Huang, Binhua, et al.
Published: (2025)
MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition
by: Wang, Ruoyu, et al.
Published: (2024)
by: Wang, Ruoyu, et al.
Published: (2024)
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
by: Liu, Mushui, et al.
Published: (2024)
by: Liu, Mushui, et al.
Published: (2024)
CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition
by: Wang, Ruoyu, et al.
Published: (2024)
by: Wang, Ruoyu, et al.
Published: (2024)
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
by: Wang, Mengmeng, et al.
Published: (2024)
by: Wang, Mengmeng, et al.
Published: (2024)
HabitAction: A Video Dataset for Human Habitual Behavior Recognition
by: Li, Hongwu, et al.
Published: (2024)
by: Li, Hongwu, et al.
Published: (2024)
Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track
by: Pan, Feiyu, et al.
Published: (2024)
by: Pan, Feiyu, et al.
Published: (2024)
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
by: Lin, Kun-Yu, et al.
Published: (2024)
by: Lin, Kun-Yu, et al.
Published: (2024)
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation
by: Fang, Hao, et al.
Published: (2025)
by: Fang, Hao, et al.
Published: (2025)
Leveraging Temporal Contextualization for Video Action Recognition
by: Kim, Minji, et al.
Published: (2024)
by: Kim, Minji, et al.
Published: (2024)
Heatmap Pooling Network for Action Recognition from RGB Videos
by: Liu, Mengyuan, et al.
Published: (2025)
by: Liu, Mengyuan, et al.
Published: (2025)
MA-FSAR: Multimodal Adaptation of CLIP for Few-Shot Action Recognition
by: Xing, Jiazheng, et al.
Published: (2023)
by: Xing, Jiazheng, et al.
Published: (2023)
Frequency Perception Network for Camouflaged Object Detection
by: Cong, Runmin, et al.
Published: (2023)
by: Cong, Runmin, et al.
Published: (2023)
SpatioTemporal Difference Network for Video Depth Super-Resolution
by: Wang, Zhengxue, et al.
Published: (2025)
by: Wang, Zhengxue, et al.
Published: (2025)
CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization
by: Xia, Rui, et al.
Published: (2025)
by: Xia, Rui, et al.
Published: (2025)
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
by: Wang, Yilong, et al.
Published: (2024)
by: Wang, Yilong, et al.
Published: (2024)
Unsupervised Spatial-Temporal Feature Enrichment and Fidelity Preservation Network for Skeleton based Action Recognition
by: Li, Chuankun, et al.
Published: (2024)
by: Li, Chuankun, et al.
Published: (2024)
RSONet: Region-guided Selective Optimization Network for RGB-T Salient Object Detection
by: Wan, Bin, et al.
Published: (2026)
by: Wan, Bin, et al.
Published: (2026)
Query-guided Prototype Evolution Network for Few-Shot Segmentation
by: Cong, Runmin, et al.
Published: (2024)
by: Cong, Runmin, et al.
Published: (2024)
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
by: Hao, Yanbin, et al.
Published: (2024)
by: Hao, Yanbin, et al.
Published: (2024)
Mixture-of-Modality-Experts with Holistic Token Learning for Fine-Grained Multimodal Visual Analytics in Driver Action Recognition
by: Liu, Tianyi, et al.
Published: (2026)
by: Liu, Tianyi, et al.
Published: (2026)
Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion
by: Wei, Shuoyan, et al.
Published: (2026)
by: Wei, Shuoyan, et al.
Published: (2026)
SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection
by: Cong, Runmin, et al.
Published: (2023)
by: Cong, Runmin, et al.
Published: (2023)
Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion
by: Xue, Minglong, et al.
Published: (2024)
by: Xue, Minglong, et al.
Published: (2024)
G2HFNet: GeoGran-Aware Hierarchical Feature Fusion Network for Salient Object Detection in Optical Remote Sensing Images
by: Wan, Bin, et al.
Published: (2026)
by: Wan, Bin, et al.
Published: (2026)
Beyond Global Scanning: Adaptive Visual State Space Modeling for Salient Object Detection in Optical Remote Sensing Images
by: Ren, Mengyu, et al.
Published: (2025)
by: Ren, Mengyu, et al.
Published: (2025)
EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges
by: Jon, Hyo Jin, et al.
Published: (2026)
by: Jon, Hyo Jin, et al.
Published: (2026)
CLIP-AUTT: Test-Time Personalization with Action Unit Prompting for Fine-Grained Video Emotion Recognition
by: Zeeshan, Muhammad Osama, et al.
Published: (2026)
by: Zeeshan, Muhammad Osama, et al.
Published: (2026)
MoCrop: Training Free Motion Guided Cropping for Efficient Video Action Recognition
by: Huang, Binhua, et al.
Published: (2025)
by: Huang, Binhua, et al.
Published: (2025)
Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)
by: Zhu, Sa, et al.
Published: (2026)
From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection
by: Qin, Qi, et al.
Published: (2025)
by: Qin, Qi, et al.
Published: (2025)
PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
by: Wang, Jie, et al.
Published: (2025)
by: Wang, Jie, et al.
Published: (2025)
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2025)
by: Gao, Bin-Bin, et al.
Published: (2025)
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
by: Zhu, Wencheng, et al.
Published: (2025)
by: Zhu, Wencheng, et al.
Published: (2025)
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection
by: Cong, Runmin, et al.
Published: (2023)
by: Cong, Runmin, et al.
Published: (2023)
UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track
by: Fang, Hao, et al.
Published: (2024)
by: Fang, Hao, et al.
Published: (2024)
Similar Items
-
GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models
by: Wang, Bin, et al.
Published: (2025) -
Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation
by: Cong, Runmin, et al.
Published: (2025) -
EZ-CLIP: Efficient Zeroshot Video Action Recognition
by: Ahmad, Shahzad, et al.
Published: (2023) -
CLIP-guided Prototype Modulating for Few-shot Action Recognition
by: Wang, Xiang, et al.
Published: (2023) -
MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors
by: Huang, Binhua, et al.
Published: (2025)