Saved in:
| Main Authors: | Du, Keqing, Yang, Xinyu, Chen, Hang |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.12401 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
by: Yang, Danni, et al.
Published: (2024)
by: Yang, Danni, et al.
Published: (2024)
IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting
by: Wang, Hang, et al.
Published: (2024)
by: Wang, Hang, et al.
Published: (2024)
G-Refine: A General Quality Refiner for Text-to-Image Generation
by: Li, Chunyi, et al.
Published: (2024)
by: Li, Chunyi, et al.
Published: (2024)
Generative Frame Sampler for Long Video Understanding
by: Yao, Linli, et al.
Published: (2025)
by: Yao, Linli, et al.
Published: (2025)
Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
by: Zhang, Deyu, et al.
Published: (2025)
by: Zhang, Deyu, et al.
Published: (2025)
Boosting Temporal Sentence Grounding via Causal Inference
by: Tang, Kefan, et al.
Published: (2025)
by: Tang, Kefan, et al.
Published: (2025)
TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning
by: Xie, Jingjing, et al.
Published: (2024)
by: Xie, Jingjing, et al.
Published: (2024)
Where Does Vision Meet Language? Understanding and Refining Visual Fusion in MLLMs via Contrastive Attention
by: Song, Shezheng, et al.
Published: (2026)
by: Song, Shezheng, et al.
Published: (2026)
Audio-Visual Segmentation via Unlabeled Frame Exploitation
by: Liu, Jinxiang, et al.
Published: (2024)
by: Liu, Jinxiang, et al.
Published: (2024)
HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
Noise-Tolerant Learning for Audio-Visual Action Recognition
by: Han, Haochen, et al.
Published: (2022)
by: Han, Haochen, et al.
Published: (2022)
Probabilistic Temporal Masked Attention for Cross-view Online Action Detection
by: Xie, Liping, et al.
Published: (2025)
by: Xie, Liping, et al.
Published: (2025)
Wavelet-Decoupling Contrastive Enhancement Network for Fine-Grained Skeleton-Based Action Recognition
by: Chang, Haochen, et al.
Published: (2024)
by: Chang, Haochen, et al.
Published: (2024)
An Empirical Comparison of Video Frame Sampling Methods for Multi-Modal RAG Retrieval
by: Kandhare, Mahesh, et al.
Published: (2024)
by: Kandhare, Mahesh, et al.
Published: (2024)
SkeFi: Cross-Modal Knowledge Transfer for Wireless Skeleton-Based Action Recognition
by: Huang, Shunyu, et al.
Published: (2026)
by: Huang, Shunyu, et al.
Published: (2026)
Memory-Guided View Refinement for Dynamic Human-in-the-loop EQA
by: Lu, Xin, et al.
Published: (2026)
by: Lu, Xin, et al.
Published: (2026)
Joint Flow And Feature Refinement Using Attention For Video Restoration
by: Merugu, Ranjith, et al.
Published: (2025)
by: Merugu, Ranjith, et al.
Published: (2025)
Hierarchical Refinement of Universal Multimodal Attacks on Vision-Language Models
by: Zhang, Peng-Fei, et al.
Published: (2026)
by: Zhang, Peng-Fei, et al.
Published: (2026)
Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling
by: Li, Xinyu, et al.
Published: (2025)
by: Li, Xinyu, et al.
Published: (2025)
READ-Net: Clarifying Emotional Ambiguity via Adaptive Feature Recalibration for Audio-Visual Depression Detection
by: Chen, Chenglizhao, et al.
Published: (2026)
by: Chen, Chenglizhao, et al.
Published: (2026)
Causal Debiasing for Visual Commonsense Reasoning
by: Zou, Jiayi, et al.
Published: (2025)
by: Zou, Jiayi, et al.
Published: (2025)
Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation
by: Liu, Lingyu, et al.
Published: (2026)
by: Liu, Lingyu, et al.
Published: (2026)
Human Action Recognition without Human
by: Kataoka, Hirokatsu, et al.
Published: (2016)
by: Kataoka, Hirokatsu, et al.
Published: (2016)
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
by: Xu, Yifang, et al.
Published: (2025)
by: Xu, Yifang, et al.
Published: (2025)
Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
by: Yan, Yichen, et al.
Published: (2024)
by: Yan, Yichen, et al.
Published: (2024)
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
by: Zhou, Jinxing, et al.
Published: (2024)
by: Zhou, Jinxing, et al.
Published: (2024)
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
by: Lu, Zhenyu, et al.
Published: (2025)
by: Lu, Zhenyu, et al.
Published: (2025)
Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)
by: Zhu, Sa, et al.
Published: (2026)
Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning
by: Yin, Jianjian, et al.
Published: (2025)
by: Yin, Jianjian, et al.
Published: (2025)
Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM
by: Zhang, Pingping, et al.
Published: (2024)
by: Zhang, Pingping, et al.
Published: (2024)
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
by: Chen, Sen, et al.
Published: (2022)
by: Chen, Sen, et al.
Published: (2022)
CUS3D :CLIP-based Unsupervised 3D Segmentation via Object-level Denoise
by: Yu, Fuyang, et al.
Published: (2024)
by: Yu, Fuyang, et al.
Published: (2024)
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
by: Yu, Hang, et al.
Published: (2025)
by: Yu, Hang, et al.
Published: (2025)
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
by: Zhang, Zhicheng, et al.
Published: (2025)
by: Zhang, Zhicheng, et al.
Published: (2025)
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
by: Chen, Yuanhong, et al.
Published: (2023)
by: Chen, Yuanhong, et al.
Published: (2023)
Rethinking Radiology Report Generation via Causal Inspired Counterfactual Augmentation
by: Song, Xiao, et al.
Published: (2023)
by: Song, Xiao, et al.
Published: (2023)
Hypergraph Tversky-Aware Domain Incremental Learning for Brain Tumor Segmentation with Missing Modalities
by: Wang, Junze, et al.
Published: (2025)
by: Wang, Junze, et al.
Published: (2025)
Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement
by: Zhang, Xinyu, et al.
Published: (2023)
by: Zhang, Xinyu, et al.
Published: (2023)
POINTS1.5: Building a Vision-Language Model towards Real World Applications
by: Liu, Yuan, et al.
Published: (2024)
by: Liu, Yuan, et al.
Published: (2024)
EntroAD: Structural Entropy-Guided Prompt Adaptation for Zero-Shot Anomaly Detection
by: Zhao, Xinyu, et al.
Published: (2026)
by: Zhao, Xinyu, et al.
Published: (2026)
Similar Items
-
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
by: Yang, Danni, et al.
Published: (2024) -
IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting
by: Wang, Hang, et al.
Published: (2024) -
G-Refine: A General Quality Refiner for Text-to-Image Generation
by: Li, Chunyi, et al.
Published: (2024) -
Generative Frame Sampler for Long Video Understanding
by: Yao, Linli, et al.
Published: (2025) -
Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval
by: Zhang, Deyu, et al.
Published: (2025)