:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Man, Bai, Huihui, Li, Feng, Zhang, Chunjie, Wei, Yunchao, Chua, Tat-Seng, Zhao, Yao
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.03032
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning
by: Liu, Man, et al.
Published: (2024)

Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
by: Zhou, Zhenglin, et al.
Published: (2025)

Extending Visual Dynamics for Video-to-Music Generation
by: Liu, Xiaohao, et al.
Published: (2025)

Harnessing Group-Oriented Consistency Constraints for Semi-Supervised Semantic Segmentation in CdZnTe Semiconductors
by: Li, Peihao, et al.
Published: (2025)

Can I Trust Your Answer? Visually Grounded Video Question Answering
by: Xiao, Junbin, et al.
Published: (2023)

Region-Adaptive Transform with Segmentation Prior for Image Compression
by: Liu, Yuxi, et al.
Published: (2024)

Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art
by: Jin, Zhe, et al.
Published: (2025)

FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models
by: Liao, Xinting, et al.
Published: (2025)

Universal Scene Graph Generation
by: Wu, Shengqiong, et al.
Published: (2025)

Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval
by: Fang, Xiang, et al.
Published: (2026)

Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration
by: He, Jinghan, et al.
Published: (2026)

Disentangling Masked Autoencoders for Unsupervised Domain Generalization
by: Zhang, An, et al.
Published: (2024)

Understanding Long Videos via LLM-Powered Entity Relation Graphs
by: Chu, Meng, et al.
Published: (2025)

Principled Multimodal Representation Learning
by: Liu, Xiaohao, et al.
Published: (2025)

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
by: Jiang, Huajie, et al.
Published: (2025)

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
by: Qi, Ji, et al.
Published: (2025)

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
by: Chen, Yiyang, et al.
Published: (2022)

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective
by: Zheng, Zhedong, et al.
Published: (2022)

C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
by: Tan, Chuangchuang, et al.
Published: (2024)

Visual Adaptive Prompting for Compositional Zero-Shot Learning
by: Stein, Kyle, et al.
Published: (2025)

PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution
by: Zhang, Yaning, et al.
Published: (2025)

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
by: Liu, Han, et al.
Published: (2025)

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
by: Fei, Hao, et al.
Published: (2024)

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
by: Li, Juncheng, et al.
Published: (2023)

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
by: Fei, Hao, et al.
Published: (2023)

UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal Retrieval
by: Guo, Hongyu, et al.
Published: (2025)

Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal
by: Liao, Rongxin, et al.
Published: (2025)

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
by: Chu, Meng, et al.
Published: (2023)

MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
by: Zhu, Fengbin, et al.
Published: (2024)

Learning Visual Proxy for Compositional Zero-Shot Learning
by: Zhang, Shiyu, et al.
Published: (2025)

A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis
by: Lin, Dongheng, et al.
Published: (2025)

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
by: Tian, Junjiao, et al.
Published: (2023)

DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition
by: Xu, Yiyan, et al.
Published: (2025)

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
by: Liang, Xiao, et al.
Published: (2025)

Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
by: Wei, Shengxun, et al.
Published: (2024)

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
by: Cui, Chenhang, et al.
Published: (2024)

MURE: Hierarchical Multi-Resolution Encoding via Vision-Language Models for Visual Document Retrieval
by: Zhu, Fengbin, et al.
Published: (2026)

PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures
by: Shao, Yuheng, et al.
Published: (2025)

PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
by: Yu, Xiao, et al.
Published: (2025)

CoPS: Conditional Prompt Synthesis for Zero-Shot Anomaly Detection
by: Chen, Qiyu, et al.
Published: (2025)