:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Bin, Li, Wentong, Wang, Wenqian, Gao, Mingliang, Cong, Runmin, Zhang, Wei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.10688
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models
by: Wang, Bin, et al.
Published: (2025)

Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation
by: Cong, Runmin, et al.
Published: (2025)

EZ-CLIP: Efficient Zeroshot Video Action Recognition
by: Ahmad, Shahzad, et al.
Published: (2023)

CLIP-guided Prototype Modulating for Few-shot Action Recognition
by: Wang, Xiang, et al.
Published: (2023)

MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors
by: Huang, Binhua, et al.
Published: (2025)

MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition
by: Wang, Ruoyu, et al.
Published: (2024)

OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
by: Liu, Mushui, et al.
Published: (2024)

CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition
by: Wang, Ruoyu, et al.
Published: (2024)

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
by: Wang, Mengmeng, et al.
Published: (2024)

HabitAction: A Video Dataset for Human Habitual Behavior Recognition
by: Li, Hongwu, et al.
Published: (2024)

Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track
by: Pan, Feiyu, et al.
Published: (2024)

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
by: Lin, Kun-Yu, et al.
Published: (2024)

The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation
by: Fang, Hao, et al.
Published: (2025)

Leveraging Temporal Contextualization for Video Action Recognition
by: Kim, Minji, et al.
Published: (2024)

Heatmap Pooling Network for Action Recognition from RGB Videos
by: Liu, Mengyuan, et al.
Published: (2025)

MA-FSAR: Multimodal Adaptation of CLIP for Few-Shot Action Recognition
by: Xing, Jiazheng, et al.
Published: (2023)

Frequency Perception Network for Camouflaged Object Detection
by: Cong, Runmin, et al.
Published: (2023)

SpatioTemporal Difference Network for Video Depth Super-Resolution
by: Wang, Zhengxue, et al.
Published: (2025)

CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization
by: Xia, Rui, et al.
Published: (2025)

TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
by: Wang, Yilong, et al.
Published: (2024)

Unsupervised Spatial-Temporal Feature Enrichment and Fidelity Preservation Network for Skeleton based Action Recognition
by: Li, Chuankun, et al.
Published: (2024)

RSONet: Region-guided Selective Optimization Network for RGB-T Salient Object Detection
by: Wan, Bin, et al.
Published: (2026)

Query-guided Prototype Evolution Network for Few-Shot Segmentation
by: Cong, Runmin, et al.
Published: (2024)

PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
by: Hao, Yanbin, et al.
Published: (2024)

Mixture-of-Modality-Experts with Holistic Token Learning for Fine-Grained Multimodal Visual Analytics in Driver Action Recognition
by: Liu, Tianyi, et al.
Published: (2026)

Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion
by: Wei, Shuoyan, et al.
Published: (2026)

SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection
by: Cong, Runmin, et al.
Published: (2023)

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion
by: Xue, Minglong, et al.
Published: (2024)

G2HFNet: GeoGran-Aware Hierarchical Feature Fusion Network for Salient Object Detection in Optical Remote Sensing Images
by: Wan, Bin, et al.
Published: (2026)

Beyond Global Scanning: Adaptive Visual State Space Modeling for Salient Object Detection in Optical Remote Sensing Images
by: Ren, Mengyu, et al.
Published: (2025)

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges
by: Jon, Hyo Jin, et al.
Published: (2026)

CLIP-AUTT: Test-Time Personalization with Action Unit Prompting for Fine-Grained Video Emotion Recognition
by: Zeeshan, Muhammad Osama, et al.
Published: (2026)

MoCrop: Training Free Motion Guided Cropping for Efficient Video Action Recognition
by: Huang, Binhua, et al.
Published: (2025)

Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)

From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection
by: Qin, Qi, et al.
Published: (2025)

PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
by: Wang, Jie, et al.
Published: (2025)

AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2025)

VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
by: Zhu, Wencheng, et al.
Published: (2025)

Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection
by: Cong, Runmin, et al.
Published: (2023)

UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track
by: Fang, Hao, et al.
Published: (2024)